READ_ICEBERG
returns a table with data from the specified Iceberg table.
Topics:
- Syntax
- Parameters
- Return Type
- Best Practices
- Examples
- About Metadata Versions in File-Based Catalogs
- Limitations
Syntax
Parameters
Using a LOCATION object
Parameter | Description | Supported input types |
---|---|---|
LOCATION | The name of a location object that contains the Iceberg parameters and credentials. Firebolt recommends using LOCATION to store credentials for authentication. LOCATION must be specified as a string literal (e.g., LOCATION => 'my_location' ). Unlike URL , it cannot be used as a positional parameter. | TEXT |
NAMESPACE | The namespace an Iceberg table resides in. Used only for REST catalogs. | TEXT |
TABLE | The name of the Iceberg table to read. Used only for REST catalogs. | TEXT |
MAX_STALENESS | Specifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds' . The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds. | INTERVAL |
Using individual TVF parameters
Common Parameters
Parameter | Description | Supported input types |
---|---|---|
URL | A url pointing to a table in an Iceberg file-based catalog, or a url pointing to an Iceberg REST catalog API endpoint. For file-based catalogs, the expected format is s3://{bucket_name}/{path}/{to}/{table} or s3://{bucket_name}/{path}/{to}/{table}/metadata/{version.metadata.json} . For REST catalogs, the expected format is https://{path}/{to}/{rest}/{host} . | TEXT |
MAX_STALENESS | Specifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds' . The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds. | INTERVAL |
Common Parameters for Iceberg REST catalogs
The following parameters are used for all REST catalogs, including AWS Glue catalogs accessed via REST endpoint.Parameter | Description | Supported input types |
---|---|---|
WAREHOUSE | The name of the warehouse an Iceberg table resides in. | TEXT |
NAMESPACE | The namespace an Iceberg table resides in. | TEXT |
TABLE | The name of the Iceberg table to read. | TEXT |
Amazon S3 Parameters
The following parameters are used for authentication to both file-based catalogs and AWS Glue catalogs.Parameter | Description | Supported input types |
---|---|---|
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
Parameters for OAuth authentication
The following parameters apply to REST catalogs that use OAuth for authentication.Parameter | Description | Supported input types |
---|---|---|
OAUTH_CLIENT_ID | An OAuth client ID for authenticating to the REST catalog. . | TEXT |
OAUTH_CLIENT_SECRET | An OAuth client secret for authenticating to the REST catalog. | TEXT |
OAUTH_SCOPE | An OAuth scope for authenticating to the REST catalog. | TEXT |
OAUTH_SERVER_URL | The URL to use when requesting an access token for the REST catalog. If not specified, {URL}/v1/oauth/tokens will be used. | TEXT |
Return Type
The result is a table with data from the Iceberg files. Columns are read and parsed using their inferred data types.Best practices
- Use a
LOCATION
object to store credentials for authentication. This approach centralizes credential management and eliminates the need to specify individual credential parameters in each query. See CREATE LOCATION (Iceberg). - Specifying a value for
MAX_STALENESS
can help improve performance in tight loops and infrequently-updated tables. - It is recommended to specify all parameters using the named-parameter syntax rather than relying on parameter positions. For example: use
URL => 'http://example.com'
rather than omitting theURL =>
parameter name specifier.
Examples
Reading using a LOCATION
The following code example reads the first 5 rows from an Iceberg table using aLOCATION
object:
LOCATION
objects are supported for file-based (S3-hosted) catalogs as well as REST catalogs.
For more examples of LOCATION
, see CREATE LOCATION (Iceberg).
Using TVF parameters, from a public location in S3
The following code example reads the first 5 rows from an Iceberg table in a file-based catalog stored on S3:l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 156 | 4 | 1 | 17.00 | 17954.55 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | to beans x-ray carefull |
1 | 68 | 9 | 2 | 36.00 | 34850.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | according to the final foxes. qui | |
1 | 64 | 5 | 3 | 8.00 | 7712.48 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | ourts cajole above the furiou |
1 | 3 | 6 | 4 | 28.00 | 25284.00 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | s cajole busily above t |
1 | 25 | 8 | 5 | 24.00 | 22200.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | the regular, regular pa |
Using a direct path to a metadata.json file
A URL pointing to an Iceberg metadata.json file will also return the same result:l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 156 | 4 | 1 | 17.00 | 17954.55 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | to beans x-ray carefull |
1 | 68 | 9 | 2 | 36.00 | 34850.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | according to the final foxes. qui | |
1 | 64 | 5 | 3 | 8.00 | 7712.48 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | ourts cajole above the furiou |
1 | 3 | 6 | 4 | 28.00 | 25284.00 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | s cajole busily above t |
1 | 25 | 8 | 5 | 24.00 | 22200.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | the regular, regular pa |
Authenticated read from S3
The following code examples use various valid combinations of AWS secrets to read from S3: Using access key + secret:Reading from REST
The following code example reads the first 5 rows from an Iceberg table in a REST catalog:Reading from REST, with a custom OAuth token URL
Some Iceberg REST catalogs do not support the/v1/oauth/tokens
API, and serve OAuth tokens from a different API. For these catalogs, specify OAUTH_SERVER_URL
.
Reading from Databricks Unity Catalog
One example of an Iceberg REST catalog that requires a customOAUTH_SERVER_URL
is the Databricks Unity Catalog. The following code example reads the first 5 rows from a table in a Databricks Unity Catalog, showing how Databricks concepts map to READ_ICEBERG
parameters:
Reading from Snowflake Open Catalog
Reading from AWS Glue catalog
The following code example reads the first 5 rows from an Iceberg table in an AWS Glue catalog. Note that for AWS Glue catalogs, thewarehouse
parameter maps to catalog_id
and the namespace
parameter maps to database
:
About Metadata Versions in File-Based Catalogs
When reading from a file-based catalog, Firebolt first looks for aversion-hint.text
file to determine which metadata version to use. If one is not available, READ_ICEBERG
also accepts a path to a specific metadata.json
file.
Limitations
- Iceberg tables with Parquet data files in S3 are currently supported.
- Versions 1 and 2 of the Apache Iceberg specification are supported.
- Tables with following Iceberg features are currently not supported:
- Row-level deletes (position deletes or equality deletes)
- Schema evolution
- Partition evolution
- The following data types are currently not supported:
variant
geometry
geography
- Reading past snapshots with time travel is currently not supported.
- Nested complex types such as
struct
,list
, andmap
are currently read as nullable even if Iceberg defines the field to be non-nullable. This only applies to nested complex types (struct
/list
/map
nested inside anotherstruct
/list
/map
). - Returning partition values for Identity Transforms from partition metadata is currently not supported.
- By default, cross-region reads for S3 are disabled for
read_iceberg
, as they may incur additional costs. They may be enabled on a per-query basis using the cross_region_request_mode setting.