A table-valued function (TVF) that reads data from Apache Iceberg tables. The function can use either a location object or individual TVF parameters to access the data, and can read from file-based catalogs as well as REST catalogs. READ_ICEBERG returns a table with data from the specified Iceberg table.

Topics:

Syntax

-- Using a LOCATION object
READ_ICEBERG (
  LOCATION => <location_name>
  [, MAX_STALENESS => <max_staleness_allowed> ]
)

-- Using individual TVF parameters
READ_ICEBERG (
  URL => '<s3_or_rest_api_url>'
  -- Parameters for file-based (S3-hosted) Iceberg tables
  [, AWS_ACCESS_KEY_ID = '<aws_access_key_id>' ]
  [, AWS_SECRET_ACCESS_KEY = '<aws_secret_access_key>' ]
  [, AWS_SESSION_TOKEN = '<aws_session_token>' ]
  [, AWS_ROLE_ARN = '<aws_role_arn>' ]
  [, AWS_ROLE_EXTERNAL_ID = '<aws_role_external_id>' ]
  -- Parameters for Iceberg REST catalogs
  [, WAREHOUSE => '<warehouse_name>' ]
  [, NAMESPACE => '<namespace_name>' ]
  [, TABLE => '<table_name>' ]
  [, OAUTH_CLIENT_ID => '<oauth_client_id>' ]
  [, OAUTH_CLIENT_SECRET = '<oauth_client_secret>' ]
  [, OAUTH_SCOPE => '<oauth_scope>' ]
  [, OAUTH_SERVER_URL => '<oauth_server_url>' ]
  -- Other common parameters
  [, MAX_STALENESS => <max_staleness_allowed> ]
)

Parameters

Using a LOCATION object

ParameterDescriptionSupported input types
LOCATIONThe name of a location object that contains the Iceberg parameters and credentials. Firebolt recommends using LOCATION to store credentials for authentication.IDENTIFIER
MAX_STALENESSSpecifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds'. The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds.INTERVAL

For more on LOCATION, see CREATE LOCATION and CREATE LOCATION (Iceberg).

Using individual TVF parameters

Common Parameters

ParameterDescriptionSupported input types
URLA url pointing to a table in an Iceberg file-based catalog, or a url pointing to an Iceberg REST catalog API endpoint. For file-based catalogs, the expected format is s3://{bucket_name}/{path}/{to}/{table} or s3://{bucket_name}/{path}/{to}/{table}/metadata/{version.metadata.json}. For REST catalogs, the expected format is https://{path}/{to}/{rest}/{host}.TEXT
MAX_STALENESSSpecifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds'. The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds.INTERVAL

Parameters for file-based Iceberg tables (Iceberg tables hosted in S3)

ParameterDescriptionSupported input types
AWS_ACCESS_KEY_IDThe AWS access key ID. Used only for file-based catalogs.TEXT
AWS_SECRET_ACCESS_KEYThe AWS secret access key. Used only for file-based catalogs.TEXT
AWS_SESSION_TOKENThe AWS session token. Used only for file-based catalogs.TEXT
AWS_ROLE_ARNThe AWS role ARN. Used only for file-based catalogs.TEXT
AWS_ROLE_EXTERNAL_IDThe AWS role external ID. Used only for file-based catalogs.TEXT

Parameters for Iceberg REST catalogs

ParameterDescriptionSupported input types
WAREHOUSEThe name of the warehouse an Iceberg table resides in. Used only for REST catalogs.TEXT
NAMESPACEThe namespace an Iceberg table resides in. Used only for REST catalogs.TEXT
TABLEThe name of the Iceberg table to read. Used only for REST catalogs.TEXT
OAUTH_CLIENT_IDAn OAuth client ID for authenticating to the REST catalog. Used only for REST catalogs.TEXT
OAUTH_CLIENT_SECRETAn OAuth client secret for authenticating to the REST catalog. Used only for REST catalogs.TEXT
OAUTH_SCOPEAn OAuth scope for authenticating to the REST catalog. Used only for REST catalogs.TEXT
OAUTH_SERVER_URLThe URL to use when requesting an access token for the REST catalog. If not specified, {URL}/v1/oauth/tokens will be used. Used only for REST catalogs.TEXT

Return Type

The result is a table with data from the Iceberg files. Columns are read and parsed using their inferred data types.

Best practices

  • Firebolt recommends using a LOCATION object to store credentials for authentication. See CREATE LOCATION and CREATE LOCATION (Iceberg).
  • Specifying a value for MAX_STALENESS can help improve performance in tight loops and infrequently-updated tables.
  • It is recommended to specify all parameters using the named-parameter syntax rather than relying on parameter positions. For example: use URL => 'http://example.com' rather than omitting the URL => parameter name specifier.

Examples

Reading using a LOCATION

The following code example reads the first 5 rows from an Iceberg table using a LOCATION object that stores credentials for authentication:

CREATE LOCATION my_location
WITH 
  SOURCE = 'ICEBERG'
  CATALOG = 'FILE_BASED'
  CATALOG_OPTIONS = (
    URL = 's3://my-bucket/path/to/iceberg/table'
  )
  CREDENTIALS = ( AWS_ACCESS_KEY_ID = '1231' AWS_SECRET_ACCESS_KEY = '567' );

SELECT *
FROM READ_ICEBERG(
  LOCATION => my_location,
  MAX_STALENESS => INTERVAL '30 seconds'
)
LIMIT 5;

LOCATION objects are supported for file-based (S3-hosted) catalogs as well as REST catalogs. For more examples of LOCATION, see CREATE LOCATION and CREATE LOCATION (Iceberg).

Using TVF parameters, from a public location in S3

The following code example reads the first 5 rows from an Iceberg table in a file-based catalog stored on S3:

SELECT * 
FROM READ_ICEBERG(
  URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_iceberg/tpch/iceberg/lineitem',
  MAX_STALENESS => INTERVAL '30 seconds'
) 
LIMIT 5;

Returns

l_orderkeyl_partkeyl_suppkeyl_linenumberl_quantityl_extendedpricel_discountl_taxl_returnflagl_linestatusl_shipdatel_commitdatel_receiptdatel_shipinstructl_shipmodel_comment
11564117.0017954.550.040.02NO1996-03-131996-02-121996-03-22DELIVER IN PERSONTRUCKto beans x-ray carefull
1689236.0034850.160.090.06NO1996-04-121996-02-281996-04-20TAKE BACK RETURNMAILaccording to the final foxes. qui
164538.007712.480.100.02NO1996-01-291996-03-051996-01-31TAKE BACK RETURNREG AIRourts cajole above the furiou
136428.0025284.000.090.06NO1996-04-211996-03-301996-05-16NONEAIRs cajole busily above t
1258524.0022200.480.100.04NO1996-03-301996-03-141996-04-01NONEFOBthe regular, regular pa

Using a direct path to a metadata.json file

A URL pointing to an Iceberg metadata.json file will also return the same result:

SELECT * 
FROM READ_ICEBERG(
  URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_iceberg/tpch/iceberg/lineitem/metadata/00001-2b7ef7c8-a1d8-4239-abf5-41df838d20ba.metadata.json',
  MAX_STALENESS => INTERVAL '30 seconds'
) 
LIMIT 5;

Returns

l_orderkeyl_partkeyl_suppkeyl_linenumberl_quantityl_extendedpricel_discountl_taxl_returnflagl_linestatusl_shipdatel_commitdatel_receiptdatel_shipinstructl_shipmodel_comment
11564117.0017954.550.040.02NO1996-03-131996-02-121996-03-22DELIVER IN PERSONTRUCKto beans x-ray carefull
1689236.0034850.160.090.06NO1996-04-121996-02-281996-04-20TAKE BACK RETURNMAILaccording to the final foxes. qui
164538.007712.480.100.02NO1996-01-291996-03-051996-01-31TAKE BACK RETURNREG AIRourts cajole above the furiou
136428.0025284.000.090.06NO1996-04-211996-03-301996-05-16NONEAIRs cajole busily above t
1258524.0022200.480.100.04NO1996-03-301996-03-141996-04-01NONEFOBthe regular, regular pa

Authenticated read from S3

The following code examples use various valid combinations of AWS secrets to read from S3:

Using access key + secret:

SELECT * 
FROM READ_ICEBERG(
  URL => 's3://my-bucket/path/to/iceberg/table',
  AWS_ACCESS_KEY_ID => '1231',
  AWS_SECRET_ACCESS_KEY => '567',
  MAX_STALENESS => INTERVAL '30 seconds'
) 
LIMIT 5;

Using access key, secret, and session token:

SELECT * 
FROM READ_ICEBERG(
  URL => 's3://my-bucket/path/to/iceberg/table',
  AWS_ACCESS_KEY_ID => '1231',
  AWS_SECRET_ACCESS_KEY => '567',
  SESSION_TOKEN => 'session-token',
  MAX_STALENESS => INTERVAL '30 seconds'
) 
LIMIT 5;

Using role:

SELECT * 
FROM READ_ICEBERG(
  URL => 's3://my-bucket/path/to/iceberg/table',
  AWS_ROLE_ARN => 'arn:aws:iam::123456789012:role/S3Access',
  MAX_STALENESS => INTERVAL '30 seconds'
) 
LIMIT 5;

Using role + external id:

SELECT * 
FROM READ_ICEBERG(
  URL => 's3://my-bucket/path/to/iceberg/table',
  AWS_ROLE_ARN => 'arn:aws:iam::123456789012:role/S3Access',
  AWS_ROLE_EXTERNAL_ID => 'my-role-external-id',
  MAX_STALENESS => INTERVAL '30 seconds'
) 
LIMIT 5;

Reading from REST

The following code example reads the first 5 rows from an Iceberg table in a REST catalog:

SELECT *
FROM READ_ICEBERG(
  URL => 'https://my-iceberg-rest-catalog/api',
  WAREHOUSE  => 'my_warehouse',
  NAMESPACE => 'my_namespace',
  TABLE => 'my_table_name',
  OAUTH_CLIENT_ID => '00000000-0000-0000-0000-000000000000',
  OAUTH_CLIENT_SECRET => '1234',
  OAUTH_SCOPE => 'example_permission:all',
  MAX_STALENESS => INTERVAL '30 seconds'
)
LIMIT 5;

For more on Iceberg REST catalogs, see the Iceberg REST API spec.

Reading from REST, with a custom OAuth token URL

Some Iceberg REST catalogs do not support the /v1/oauth/tokens API, and serve OAuth tokens from a different API. For these catalogs, specify OAUTH_SERVER_URL.

SELECT *
FROM READ_ICEBERG(
  URL => 'https://my-iceberg-rest-catalog/api',
  WAREHOUSE  => 'my_warehouse',
  NAMESPACE => 'my_namespace',
  TABLE => 'my_table_name',
  OAUTH_CLIENT_ID => '00000000-0000-0000-0000-000000000000',
  OAUTH_CLIENT_SECRET => '1234',
  OAUTH_SCOPE => 'example_permission:all',
  OAUTH_SERVER_URL => 'https://my-iceberg-rest-catalog/example/token',
  MAX_STALENESS => INTERVAL '30 seconds'
)
LIMIT 5;

Reading from Databricks Unity Catalog

One example of an Iceberg REST catalog that requires a custom OAUTH_SERVER_URL is the Databricks Unity Catalog. The following code example reads the first 5 rows from a table in a Databricks Unity Catalog, showing how Databricks concepts map to READ_ICEBERG parameters:

SELECT *
FROM READ_ICEBERG(
  URL => 'https://000-0000000000000.cloud.databricks.com/api/2.1/unity-catalog/iceberg/v1'
  WAREHOUSE  => 'my_uc_catalog_name',
  NAMESPACE => 'my_uc_schema_name',
  TABLE => 'my_table_name',
  OAUTH_CLIENT_ID => '00000000-0000-0000-0000-000000000000',
  OAUTH_CLIENT_SECRET => '1234',
  OAUTH_SCOPE => 'all-apis',
  OAUTH_SERVER_URL => 'https://000-0000000000000.cloud.databricks.com/oidc/v1/token',
  MAX_STALENESS => INTERVAL '30 seconds'
)
LIMIT 5;

For configuring Unity Catalog in your Databricks workspace, see Databricks - Set up and manage Unity Catalog. Note that you will need to enable credential vending in your Unity Catalog, see Databricks - Unity Catalog credential vending for external system access. For general information about reading Databricks tables from Iceberg clients, see Databricks - Read Databricks tables from Iceberg clients.

Reading from Snowflake Open Catalog

SELECT *
FROM READ_ICEBERG(
  URL => 'https://abc123.us-west-2.aws.myapi.com/polaris/api/catalog',
  WAREHOUSE  => 'my_warehouse',
  NAMESPACE => 'my_catalog_name',
  TABLE => 'my_table_name',
  OAUTH_CLIENT_ID => '00000000-0000-0000-0000-000000000000',
  OAUTH_CLIENT_SECRET => '1234',
  OAUTH_SCOPE => 'PRINCIPAL_ROLE:ALL',
  MAX_STALENESS => INTERVAL '30 seconds'
)
LIMIT 5;

For setting up a Snowflake Open Catalog in your account, see Snowflake - Snowflake Open Catalog overview. Note that you will need to enable credential vending for your Iceberg tables, see Snowflake - Use catalog-vended credentials for Apache Iceberg™ tables. For general information about reading Snowflake Open Catalog tables from Iceberg clients, see Snowflake - Checking your REST catalog configuration.

About Metadata Versions in File-Based Catalogs

When reading from a file-based catalog, Firebolt first looks for a version-hint.text file to determine which metadata version to use. If one is not available, READ_ICEBERG also accepts a path to a specific metadata.json file.

Limitations

  • Iceberg tables with Parquet data files in S3 are currently supported.
  • Versions 1 and 2 of the Apache Iceberg specification are supported.
  • Tables with following Iceberg features are currently not supported:
    • Row-level deletes (position deletes or equality deletes)
    • Schema evolution
    • Partition evolution
  • The following data types are currently not supported:
    • variant
    • geometry
    • geography
  • Reading past snapshots with time travel is currently not supported.
  • Nested complex types such as struct, list, and map are currently read as nullable even if Iceberg defines the field to be non-nullable. This only applies to nested complex types (struct/list/map nested inside another struct/list/map).
  • Returning partition values for Identity Transforms from partition metadata is currently not supported.
  • By default, cross-region reads for S3 are disabled for read_iceberg, as they may incur additional costs.