Understand the role of external tables when loading data into Firebolt, and learn how to configure them to connect to your data lake.
FROM
clause of an INSERT
statement.PARQUET
, CSV
, TSV
, AVRO
, JSON
(JSON Lines), and ORC
. We are quick to add support for more types, so make sure to let us know if you need it.
CREATE EXTERNAL TABLE
statement.
When you use an external table to ingest data, you can explicitly reference these columns to ingest the metadata. First, you define the columns in a CREATE FACT|DIMENSION TABLE
statement. Next, you specify the virtual column names to select in the INSERT INTO
statement, with the fact or dimension table as the target. You can then query the columns in the fact or dimension table for analysis, troubleshooting, and to implement logic. For more information, see the example below.
The metadata virtual columns listed below are available in external tables.
Metadata column name | Description | Data type |
---|---|---|
$source_file_name | The full path of the row data’s source file in Amazon S3, without the bucket. For example, with a source file of s3://my_bucket/xyz/year=2018/month=01/part-00001.parquet , the $source_file_name is xyz/year=2018/month=01/part-00001.parquet . | TEXT |
$source_file_timestamp | The UTC creation timestamp in second resolution of the row’s source file in Amazon S3. (S3 objects are immutable. In cases where files are overwritten with new data - this will be Last Modified time.) | TIMESTAMPTZ |
$source_file_size | Size in bytes of the row’s source file in Amazon S3. | BIGINT |
c_id
and c_name
. First, create a LOCATION object to securely store credentials:
$source_file_name
and $source_file_timestamp
, to contain metadata values that Firebolt creates automatically for the external table.
INSERT
query below ingests the data from my_external_table
into my_dim_table_with_metadata
. The SELECT
clause explicitly specifies the metadata virtual columns, which is a requirement.
SELECT
query over my_dim_table_with_metadata
shows that the source data file (minus the s3://my_bucket
portion of the file path) and file timestamp are included in the dimension table for each row.