READ_ICEBERG
Reference material for READ_ICEBERG function
A table-valued function (TVF) that reads data from Apache Iceberg tables. The function can use either a location object or individual TVF parameters to access the data, and can read from file-based catalogs as well as REST catalogs. READ_ICEBERG
returns a table with data from the specified Iceberg table.
Topics:
- Syntax
- Parameters
- Return Type
- Best Practices
- Examples
- About Metadata Versions in File-Based Catalogs
- Limitations
Syntax
Parameters
Using a LOCATION object
Parameter | Description | Supported input types |
---|---|---|
LOCATION | The name of a location object that contains the Iceberg parameters and credentials. Firebolt recommends using LOCATION to store credentials for authentication. | IDENTIFIER |
MAX_STALENESS | Specifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds' . The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds. | INTERVAL |
For more on LOCATION
, see CREATE LOCATION and CREATE LOCATION (Iceberg).
Using individual TVF parameters
Common Parameters
Parameter | Description | Supported input types |
---|---|---|
URL | A url pointing to a table in an Iceberg file-based catalog, or a url pointing to an Iceberg REST catalog API endpoint. For file-based catalogs, the expected format is s3://{bucket_name}/{path}/{to}/{table} or s3://{bucket_name}/{path}/{to}/{table}/metadata/{version.metadata.json} . For REST catalogs, the expected format is https://{path}/{to}/{rest}/{host} . | TEXT |
MAX_STALENESS | Specifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds' . The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds. | INTERVAL |
Parameters for file-based Iceberg tables (Iceberg tables hosted in S3)
Parameter | Description | Supported input types |
---|---|---|
AWS_ACCESS_KEY_ID | The AWS access key ID. Used only for file-based catalogs. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. Used only for file-based catalogs. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. Used only for file-based catalogs. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. Used only for file-based catalogs. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. Used only for file-based catalogs. | TEXT |
Parameters for Iceberg REST catalogs
Parameter | Description | Supported input types |
---|---|---|
WAREHOUSE | The name of the warehouse an Iceberg table resides in. Used only for REST catalogs. | TEXT |
NAMESPACE | The namespace an Iceberg table resides in. Used only for REST catalogs. | TEXT |
TABLE | The name of the Iceberg table to read. Used only for REST catalogs. | TEXT |
OAUTH_CLIENT_ID | An OAuth client ID for authenticating to the REST catalog. Used only for REST catalogs. | TEXT |
OAUTH_CLIENT_SECRET | An OAuth client secret for authenticating to the REST catalog. Used only for REST catalogs. | TEXT |
OAUTH_SCOPE | An OAuth scope for authenticating to the REST catalog. Used only for REST catalogs. | TEXT |
OAUTH_SERVER_URL | The URL to use when requesting an access token for the REST catalog. If not specified, {URL}/v1/oauth/tokens will be used. Used only for REST catalogs. | TEXT |
Return Type
The result is a table with data from the Iceberg files. Columns are read and parsed using their inferred data types.
Best practices
- Firebolt recommends using a
LOCATION
object to store credentials for authentication. See CREATE LOCATION and CREATE LOCATION (Iceberg). - Specifying a value for
MAX_STALENESS
can help improve performance in tight loops and infrequently-updated tables. - It is recommended to specify all parameters using the named-parameter syntax rather than relying on parameter positions. For example: use
URL => 'http://example.com'
rather than omitting theURL =>
parameter name specifier.
Examples
Reading using a LOCATION
The following code example reads the first 5 rows from an Iceberg table using a LOCATION
object that stores credentials for authentication:
LOCATION
objects are supported for file-based (S3-hosted) catalogs as well as REST catalogs.
For more examples of LOCATION
, see CREATE LOCATION and CREATE LOCATION (Iceberg).
Using TVF parameters, from a public location in S3
The following code example reads the first 5 rows from an Iceberg table in a file-based catalog stored on S3:
Returns
l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 156 | 4 | 1 | 17.00 | 17954.55 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | to beans x-ray carefull |
1 | 68 | 9 | 2 | 36.00 | 34850.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | according to the final foxes. qui | |
1 | 64 | 5 | 3 | 8.00 | 7712.48 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | ourts cajole above the furiou |
1 | 3 | 6 | 4 | 28.00 | 25284.00 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | s cajole busily above t |
1 | 25 | 8 | 5 | 24.00 | 22200.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | the regular, regular pa |
Using a direct path to a metadata.json file
A URL pointing to an Iceberg metadata.json file will also return the same result:
Returns
l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 156 | 4 | 1 | 17.00 | 17954.55 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | to beans x-ray carefull |
1 | 68 | 9 | 2 | 36.00 | 34850.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | according to the final foxes. qui | |
1 | 64 | 5 | 3 | 8.00 | 7712.48 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | ourts cajole above the furiou |
1 | 3 | 6 | 4 | 28.00 | 25284.00 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | s cajole busily above t |
1 | 25 | 8 | 5 | 24.00 | 22200.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | the regular, regular pa |
Authenticated read from S3
The following code examples use various valid combinations of AWS secrets to read from S3:
Using access key + secret:
Using access key, secret, and session token:
Using role:
Using role + external id:
Reading from REST
The following code example reads the first 5 rows from an Iceberg table in a REST catalog:
For more on Iceberg REST catalogs, see the Iceberg REST API spec.
Reading from REST, with a custom OAuth token URL
Some Iceberg REST catalogs do not support the /v1/oauth/tokens
API, and serve OAuth tokens from a different API. For these catalogs, specify OAUTH_SERVER_URL
.
Reading from Databricks Unity Catalog
One example of an Iceberg REST catalog that requires a custom OAUTH_SERVER_URL
is the Databricks Unity Catalog. The following code example reads the first 5 rows from a table in a Databricks Unity Catalog, showing how Databricks concepts map to READ_ICEBERG
parameters:
For configuring Unity Catalog in your Databricks workspace, see Databricks - Set up and manage Unity Catalog. Note that you will need to enable credential vending in your Unity Catalog, see Databricks - Unity Catalog credential vending for external system access. For general information about reading Databricks tables from Iceberg clients, see Databricks - Read Databricks tables from Iceberg clients.
Reading from Snowflake Open Catalog
For setting up a Snowflake Open Catalog in your account, see Snowflake - Snowflake Open Catalog overview. Note that you will need to enable credential vending for your Iceberg tables, see Snowflake - Use catalog-vended credentials for Apache Iceberg™ tables. For general information about reading Snowflake Open Catalog tables from Iceberg clients, see Snowflake - Checking your REST catalog configuration.
About Metadata Versions in File-Based Catalogs
When reading from a file-based catalog, Firebolt first looks for a version-hint.text
file to determine which metadata version to use. If one is not available, READ_ICEBERG
also accepts a path to a specific metadata.json
file.
Limitations
- Iceberg tables with Parquet data files in S3 are currently supported.
- Versions 1 and 2 of the Apache Iceberg specification are supported.
- Tables with following Iceberg features are currently not supported:
- Row-level deletes (position deletes or equality deletes)
- Schema evolution
- Partition evolution
- The following data types are currently not supported:
variant
geometry
geography
- Reading past snapshots with time travel is currently not supported.
- Nested complex types such as
struct
,list
, andmap
are currently read as nullable even if Iceberg defines the field to be non-nullable. This only applies to nested complex types (struct
/list
/map
nested inside anotherstruct
/list
/map
). - Returning partition values for Identity Transforms from partition metadata is currently not supported.
- By default, cross-region reads for S3 are disabled for
read_iceberg
, as they may incur additional costs.