Overview
Airbyte is an open-source data integration platform that significantly simplifies the ETL (Extract, Transform, Load) process, making it easier for users to manage and migrate their data across various sources. By providing a user-friendly interface and robust functionality, Airbyte enables seamless data movement and transformation, catering to a wide range of data integration needs. One of the key features of Airbyte is its extensive range of connectors, which allow it to integrate with numerous data sources and destinations.
Using Airbyte’s Firebolt connector, users can efficiently and effortlessly load large amounts of data to and from Firebolt. This capability extends to integration with a wide array of data sources, thanks to Airbyte’s extensive library of connectors. Whether your data resides in cloud storage, on-premises databases, SaaS applications, or other data warehouses, Airbyte facilitates smooth and reliable data transfer between these sources and Firebolt.
Quickstart
There are several ways to deploy Airbyte. In this tutorial we will the easiest way to start prototyping by using a Docker Compose deployment locally.
If you already have an airbyte deployment skip to the configuration section.
Prerequisites
- Docker: Ensure you have Docker installed. You can download it from here.
- Firebolt Account: You need an active Firebolt account. Sign up here if you don’t have one.
- Firebolt Database and Table: Make sure you have a Firebolt database and table with data ready for querying.
- Firebolt Service Account: Create a service account in Firebolt and note its id and secret.
Step 1: Deploy Airbyte Locally with Docker
- Create a new directory for your Airbyte setup:
git clone --depth=1 https://github.com/airbytehq/airbyte.git
- Switch to the Airbyte directory:
cd airbyte
- Start Airbyte by running the following command in the terminal:
./run-ab-platform.sh
-
Open your browser and navigate to
http://localhost:8000
to access the Airbyte UI. - You will be asked for a username and password. By default the username is
airbyte
and the password ispassword
. Before you deploy Airbyte in production make sure to change the password.
Step 2: Configure Firebolt Connection via UI
- In the Airbyte UI, click on the “Connections” tab and select “Create your first connection”.
- Click on “New Destination” and select “Firebolt” as the destination type.
- Enter your Firebolt connection details:
- Client ID: Your service account id.
- Client Secret: Your service account secret.
- Database: Your database name.
- Account: Your firebolt account.
- Engine: Firebolt engine which will run the ingestion.
- Host (Optional): For non-standard use cases. Should be left blank.
- Select replication strategy. SQL is easier to setup but S3 is more performant on production loads. See the Airbyte doc for more information.
-
Save.
Step 3: Create a Connection in Airbyte
- In the Airbyte UI, click on the “Connections” tab and select “Create your first connection” (“New Connection” if you already have a connection defined).
- Choose a source from which you want to extract data. We’ll be using Faker to generate some sample data.
- Leave fields as is and click “Set up source”.
- Next in the destination screen select the Firebolt destination you configured earlier.
- Select the streams you want to replicate and sync mode (Full refresh or Incremental). To save time select only “products” stream.
- Finally specify the frequency of your data repication or manual if you want to trigger the job in UI or via an API call.
-
Click “Set up connection” to start syncing data from your source to Firebolt!
Step 4: Monitor and Manage Data Syncs
- Use the Airbyte UI to monitor your data syncs and ensure that data is being transferred accurately and efficiently.
-
Adjust sync settings and transformations as needed to optimize your ETL process. You can leverage DBT to
Output schema
The Firebolt Destination connector is a V1 connector, meaning it works with raw data. Refer to Airbyte’s Destination V2 document to learn about the differences. Each stream is written into its own Fact table in Firebolt, containing three columns:
*_airbyte_ab_id
: a UUID assigned by Airbyte to each processed event. The column type is TEXT.
_airbyte_emitted_at
: a TIMESTAMP indicating when the event was pulled from the source._airbyte_data
: a JSON blob representing event data, stored as TEXT, but can be parsed using JSON functions.
Further Reading
After setting up Airbyte with Firebolt, explore these resources to leverage additional features and enhance your data integration capabilities:
- Learn how to use Firebolt Source.
- Ensure you’re following security guidelines.
- Explore other deployment options.
- Configure your connections.