Importing External Data
External data can be imported into Firebolt Core from several different sources and in several different formats.- Raw data files stored on Amazon S3 or Google Cloud Storage.
- The easiest way to access such data is to import it into a SQL table using COPY FROM.
COPY FROM
supports many convenience features such as schema discovery or metadata filtering, and can easily adapt to different data loading workflows. - Alternatively, you can also create an external table encompassing all relevant data files. This has the advantage that no data is persistently stored and thus duplicated on the Firebolt Core cluster itself, but fewer convenience features are available for external tables than for
COPY FROM
. - Data files can also be read directly with the read_parquet(..) or read_csv(..) table-valued functions.
- The easiest way to access such data is to import it into a SQL table using COPY FROM.
- Apache Iceberg tables can be read through the read_iceberg(..) table-valued function.
- We currently support a subset of Iceberg catalogs, including file-based catalogs, REST catalogs, and the Databricks Unity catalog.
- We currently support data files stored on Amazon S3 or Google Cloud Storage.
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
parameters of the respective SQL command or function (e.g. CREATE LOCATION, CREATE EXTERNAL TABLE, or read_iceberg(..)).
Managing Metadata & Data
In addition to processing external data, Firebolt Core can also manage relational data itself. Most DDL and all DML commands are supported in Firebolt Core for this purpose. It is important to note, however, that Firebolt Core provides no compute-storage isolation. In other words, data managed by one Firebolt Core cluster cannot be shared with any other Firebolt Core cluster. Furthermore, tables are sharded across all nodes, which means that a Firebolt Core cluster containing such tables cannot be resized to a different number of nodes (see also Differences between Firebolt Core and managed Firebolt). Please refer to Deployment and Operational Guide for further details about setting up persistent storage for the managed data used in Firebolt Core nodes.Exporting Data
Data can be exported from Firebolt Core through the following means.- COPY TO writes raw data files to Amazon S3 or Google Cloud Storage.
- Alternatively, you can also process query results within your client application (see Connect). If the only goal is to persist query results to raw data files (e.g. in an ETL pipeline), doing this in the client will generally be slower than using
COPY TO
due to the added cost of serializing and transferring data to the client.