Importing External Data
External data can be imported into Firebolt Core from different object storage providers. In the default configuration, you can import data from Amazon S3 and Google Cloud Storage. In addition, Firebolt Core can work with any S3-compatible object store like MinIO or Cloudflare R2. To enable this, you can use thedefault_s3_endpoint_override config property in the Firebolt Core Configuration File.
- Raw data files in many different formats can be imported directly from object storage.
- The easiest way to access such data is to import it into a SQL table using COPY FROM.
COPY FROMsupports many convenience features such as schema discovery or metadata filtering, and can easily adapt to different data loading workflows. - Alternatively, you can also create an external table encompassing all relevant data files. This has the advantage that no data is persistently stored and thus duplicated on the Firebolt Core cluster itself, but fewer convenience features are available for external tables than for
COPY FROM. - Data files can also be read directly with the read_parquet(..) or read_csv(..) table-valued functions.
- The easiest way to access such data is to import it into a SQL table using COPY FROM.
- Apache Iceberg tables can be read through the read_iceberg(..) table-valued function.
- We currently support a subset of Iceberg catalogs, including file-based catalogs, the Databricks Unity catalog, the AWS Glue catalog, or any other catalog that implements the Iceberg REST Catalog API.
- The data files of the Iceberg table may be stored in any of the supported object stores.
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY parameters of the respective SQL command or function (e.g. CREATE LOCATION, CREATE EXTERNAL TABLE, or read_iceberg(..)).
Managing Metadata & Data
In addition to processing external data, Firebolt Core can also manage relational data itself. Most DDL and all DML commands are supported in Firebolt Core for this purpose. It is important to note, however, that Firebolt Core provides no compute-storage isolation. In other words, data managed by one Firebolt Core cluster cannot be shared with any other Firebolt Core cluster. Furthermore, tables are sharded across all nodes, which means that a Firebolt Core cluster containing such tables cannot be resized to a different number of nodes (see also Differences between Firebolt Core and managed Firebolt). Please refer to Deployment and Operational Guide for further details about setting up persistent storage for the managed data used in Firebolt Core nodes.Exporting Data
Data can be exported from Firebolt Core through the following means.- COPY TO writes raw data files to Amazon S3 or Google Cloud Storage.
- Alternatively, you can also process query results within your client application (see Connect). If the only goal is to persist query results to raw data files (e.g. in an ETL pipeline), doing this in the client will generally be slower than using
COPY TOdue to the added cost of serializing and transferring data to the client.