Overview
DBT, or Data Build Tool, is a framework designed for managing and executing data transformations within modern data warehousing architectures. It facilitates the development and deployment of SQL-based transformations in a version-controlled environment, enabling collaboration and ensuring reproducibility of data pipelines. DBT streamlines the process of transforming raw data into analytics-ready datasets, accelerating the delivery of insights.
The Firebolt adapter for dbt brings together dbt’s state-of-the-art development tools and Firebolt’s next-generation analytics performance. On top of dbt’s core features, the adapter offers native support for all of Firebolt’s index types and has been specifically enhanced to support ingestion from S3 using Firebolt’s external tables mechanics.
Prerequisites
There are two ways to deploy DBT: self-hosted DBT Core and managed DBT Cloud.
This guide shows how to set up a local installation of DBT Core. This guide uses Python’s pip
package manager, but you can use the following ways to install DBT: Homebrew, Docker, and from source.
You will need the following:
- A GitHub account.
- Python 3.8+.
Quickstart
This guide shows you how to set up DBT with Firebolt and run your first DBT model.
Setup DBT Core
- Create a new Python virtual environment, as shown in the following script example:
python3 -m venv dbt-env
- Activate your
venv
, as shown in the following script example:source dbt-env/bin/activate
- Install Firebolt’s adapter for DBT, as shown in the following script example:
python -m pip install dbt-firebolt
- (Optional) Check that both dbt packages are installed:
python -m pip list | grep dbt
This command should return
dbt-core
anddbt-firebolt
and their respective versions.
Setup connection to Firebolt
DBT uses a profiles.yml
file to store the connection information. This file generally lives outside of your dbt project to avoid checking in sensitive information in to version control.
The usual place to create this file on Mac and Linux is ~/.dbt/profiles.yml
.
- Open
~/.dbt/profiles.yml
with your preferred text editor. - Paste the following sample configuration:
jaffle-shop: target: dev outputs: dev: type: firebolt client_id: "<client-id>" client_secret: "<client-secret>" database: "<database-name>" engine_name: "<engine-name>" account_name: "<account-name>" schema: "<table-prefix>"
-
Replace the placeholders with your account’s information.
<client-id>
and<client-secret>
are key and secret of your service account. If you don’t have one, follow the steps in the Manage service accounts page to learn how to set one up.<database-name>
and<engine-name>
are the Firebolt’s database and engine that you want your queries to run.<account-name>
is a Firebolt account that you’re connected to. Learn more here.<table-prefix>
is a prefix prepended to your table names. Since Firebolt does not support custom schemas, this prefix serves as a workaround to prevent table name conflicts during concurrent development.
Setup Jaffle Shop, a sample dbt project
jaffle_shop
is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers and orders model ready for analytics. This version is designed to showcase Firebolt’s integration with DBT.
- Clone
jaffle-shop-firebolt
repository and change to the newly created directory, as follows:git clone https://github.com/firebolt-db/jaffle_shop_firebolt.git cd jaffle_shop_firebolt
- Ensure your profile is setup correctly:
dbt debug
If you’re seeing an error here, check that your
profiles.yml
is set up correctly, is in the right directory on your system, and that the engine. is running. Also check that you’re still indbt-env
virtual Python environment that we’ve setup earlier and that both packages are present. - Install dependent packages:
dbt deps
- Run the external table model. If your database is not in
us-east-1
AWS region then refer to the Readme on how to copy the files.dbt run-operation stage_external_sources
- Load sample CSV in your database:
dbt seed
- Run the models:
dbt run
You should now see the customers
and orders
tables in your database, created using dbt models. From here you can explore more of DBT’s capabilities, including incremental models, documentation generation, and more, by following the official guides in the section below.
Limitations
Not every feature of DBT is supported in Firebolt. You can find an up-to-date list of features in the adapter documentation.
External table loading strategy
In the previous Jaffle Shop example we used a public Amazon S3 bucket to load data. If your bucket contains sensitive data, you will want to restrict access. Follow our guide to set up AWS authentication using an ID and secret key.
In your dbt_project.yml
, you can specify the credentials for your external table in fields aws_key_id
and aws_secret_key
, as shown in the following code example:
sources:
- name: firebolt_external
schema: ""
loader: S3
tables:
- name: <table-name>
external:
url: 's3://<bucket_name>/'
object_pattern: '<regex>'
type: '<type>'
credentials:
aws_key_id: <aws-id>
aws_secret_key: <aws-secret-key>
object_pattern: '<regex>'
compression: '<compression-type>'
partitions:
- name: <partition-name>
data_type: <partition-type>
regex: '<partition-definition-regex>'
columns:
- name: <column-name>
data_type: <type>
To use external tables, you must define a table as external in your dbt_project.yml
file. Every external table must contain the fields: url
, type
, and object_pattern
. The Firebolt external table specification requires fewer fields than those specified in the dbt documentation.
“Copy” loading strategy
You can also use COPY FROM to load data from Amazon S3 into Firebolt. It has a simple syntax and doesn’t require an exact match with your source data. COPY_FROM
does not create an intermediate table and writes your data straight into Firebolt so you can start working with it right away.
The copy syntax in dbt closely adheres to the syntax in Firebolt’s COPY_FROM
.
To use COPY FROM
instead of creating an external table, set strategy: copy
in your external source definition. For backwards compatibility, if no strategy is specified, the external table strategy is used by default.
sources:
- name: s3
tables:
- name: <table-name>
external:
strategy: copy
url: 's3://<bucket_name>/'
credentials:
aws_key_id: <aws-id>
aws_secret_key: <aws-secret-key>
options:
object_pattern: '<regex>'
type: 'CSV'
auto_create: true
You can also include the following options:
options:
object_pattern: '<regex>'
type: 'CSV'
auto_create: true
allow_column_mismatch: false
max_errors_per_file: 10
csv_options:
header: true
delimiter: ','
quote: DOUBLE_QUOTE
escape: '\'
null_string: '\\N'
empty_field_as_null: true
skip_blank_lines: true
date_format: 'YYYY-MM-DD'
timestamp_format: 'YYYY-MM-DD HH24:MI:SS'
In the previous code example, csv_options
are indented. For detailed descriptions of these options and their allowed values, refer to the parameter specification.