Process Data
Firebolt Core is all about high-throughput, low-latency data processing. As outlined in Connect, all communication with Firebolt Core ultimately occurs through SQL and data processing is no exception. Firebolt Core generally supports the full SQL dialect documented in our SQL reference, except a few exceptions that concern features specific to our managed Cloud data warehouse (e.g. RBAC or engine management commands). A complete list of differences between Firebolt Core and the managed Firebolt Cloud data warehouse can be found in Differences between Firebolt Core and managed Firebolt.
The remainder of this page focuses specifically on importing, managing, and exporting data in Firebolt Core.
Importing External Data
External data can be imported into Firebolt Core from several different sources and in several different formats.
- Raw data files stored on Amazon S3 or Google Cloud Storage.
- The easiest way to access such data is to import it into a SQL table using COPY FROM.
COPY FROM
supports many convenience features such as schema discovery or metadata filtering, and can easily adapt to different data loading workflows. - Alternatively, you can also create an external table encompassing all relevant data files. This has the advantage that no data is persistently stored and thus duplicated on the Firebolt Core cluster itself, but fewer convenience features are available for external tables than for
COPY FROM
. - Data files can also be read directly with the read_parquet(..) or read_csv(..) table-valued functions.
- The easiest way to access such data is to import it into a SQL table using COPY FROM.
- Apache Iceberg tables can be read through the read_iceberg(..) table-valued function.
- We currently support a subset of Iceberg catalogs, including file-based catalogs, REST catalogs, and the Databricks Unity catalog.
- We currently support data files stored on Amazon S3 or Google Cloud Storage.
Note that data stored on Google Cloud Storage can currently only be accessed through the S3 interoperability layer exposed by GCS. In order to access such data from Firebolt Core, you will need to navigate to the “Access keys for your user account” section of the Interoperability tab in your Cloud Storage settings and generate an access key & secret for your account there. These will then need to be specified as the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
parameters of the respective SQL command or function (e.g. CREATE LOCATION, CREATE EXTERNAL TABLE, or read_iceberg(..)).
Managing Metadata & Data
In addition to processing external data, Firebolt Core can also manage relational data itself. Most DDL and all DML commands are supported in Firebolt Core for this purpose. It is important to note, however, that Firebolt Core provides no compute-storage isolation. In other words, data managed by one Firebolt Core cluster cannot be shared with any other Firebolt Core cluster. Furthermore, tables are sharded across all nodes, which means that a Firebolt Core cluster containing such tables cannot be resized to a different number of nodes (see also Differences between Firebolt Core and managed Firebolt).
Please refer to Deployment and Operational Guide for further details about setting up persistent storage for the managed data used in Firebolt Core nodes.
Exporting Data
Data can be exported from Firebolt Core through the following means.
- COPY TO writes raw data files to Amazon S3 or Google Cloud Storage.
- Alternatively, you can also process query results within your client application (see Connect). If the only goal is to persist query results to raw data files (e.g. in an ETL pipeline), doing this in the client will generally be slower than using
COPY TO
due to the added cost of serializing and transferring data to the client.
Analogously to reading data, writing data to Google Cloud Storage currently goes through the S3 interoperability layer exposed by GCS and requires a suitable access key & secret (see above for details).
Examples
The Firebolt Core GitHub repository contains some examples for the different ways to ingest and export data in Firebolt Core.