LIST_ICEBERG_FILES - Firebolt Documentation

Since 4.32 A table-valued function (TVF) that lists the data files of an Apache Iceberg table along with the file-level metadata Firebolt reads from the table’s manifests. Like READ_ICEBERG, it accesses the table through either a location object or individual TVF parameters, and supports file-based catalogs as well as REST catalogs. Unlike READ_ICEBERG, it does not read the table’s rows; it returns one row per file, including the table’s delete files, with the per-file statistics, partition values, and deleted-row positions recorded in the Iceberg metadata. Use it to inspect file layout, partitioning, file sizes, per-column min/max bounds, and deletes without scanning the underlying data, for example when debugging pruning, planning a vacuum, or auditing snapshot contents.

Syntax

LIST_ICEBERG_FILES accepts the same LOCATION / URL access and authentication parameters as READ_ICEBERG, plus the file-selection parameters below. It does not accept MAX_STALENESS.

LIST_ICEBERG_FILES (
  -- Access and authentication: same as READ_ICEBERG (LOCATION, or URL plus the
  -- relevant WAREHOUSE / NAMESPACE / TABLE, AWS_*, and OAUTH_* / BEARER_TOKEN params).
  { LOCATION => '<location_name>' | URL => '<s3_or_rest_api_url>' }
  [, ... ]
  -- File selection
  [, SNAPSHOT_ID => <snapshot_id> ]
  [, SNAPSHOT_TIMESTAMP => <epoch_milliseconds> ]
  [, ADDED_FILES_SINCE_SNAPSHOT_ID => <snapshot_id> ]
)

Parameters

File selection parameters

Parameter	Description	Supported input types
`SNAPSHOT_ID`	Lists the files of a specific snapshot by its Iceberg snapshot ID. When omitted, the current snapshot is used.	`BIGINT`
`SNAPSHOT_TIMESTAMP`	Lists the files of the snapshot that was current at the given time, expressed as epoch milliseconds.	`BIGINT`
`ADDED_FILES_SINCE_SNAPSHOT_ID`	Lists only the files added since the given snapshot ID, rather than all files in the selected snapshot. Use this to inspect what an incremental change introduced.	`BIGINT`

Return Type

The result is a table with one row per file. Both data files and the table’s row-level delete files are listed as rows. The schema has three parts:

Fixed columns describing each file and its manifest entry.
Per-column statistics: a min_<column> and max_<column> pair for every column of the table, holding the lower and upper bound recorded for that column in the file’s manifest entry. Each bound has the same data type as its source column and is nullable.
Per-partition values: a partition_value_<partition_column>_<field_id> column for every partition field in the table’s partition specs, holding that file’s typed partition value.

Fixed columns:

Column	Type	Description
`manifest_path`	`TEXT`	Path to the manifest file that this entry was read from.
`manifest_sequence_number`	`BIGINT`	Sequence number of the manifest.
`manifest_content`	`TEXT`	Content type of the manifest (data or deletes).
`status`	`TEXT`	Manifest-entry status of the file (for example added, existing, or deleted) within the selected snapshot.
`content`	`TEXT`	Content type of the file: a data file, position-delete file, or equality-delete file.
`file_path`	`TEXT`	Full path of the file in object storage.
`file_format`	`TEXT`	On-disk format of the file (for example `PARQUET`).
`record_count`	`BIGINT`	Number of records the file contains.
`file_size`	`BIGINT`	Size of the file in bytes.
`partition_columns`	`ARRAY(TEXT)`	Names of the partition columns that apply to the file.
`partition_transforms`	`ARRAY(TEXT)`	Partition transforms applied to those columns (for example `day`, `bucket[16]`).
`partition_values`	`ARRAY(TEXT)`	The file’s partition values, as text, aligned with `partition_columns`.
`file_name`	`TEXT`	File name component of `file_path`.
`file_bucket`	`TEXT`	Bucket component of `file_path`.
`file_storage_api`	`TEXT`	Storage API used to access the file.
`file_etag`	`TEXT`	ETag of the file, when available.
`deleted_positions`	`BYTEA`	Serialized Roaring bitmap of the row positions deleted from this data file by the delete files that apply to the selected snapshot. Populated on data-file rows; empty when no rows are deleted, and empty on delete-file rows.
`partition_spec_id`	`INTEGER`	ID of the partition spec the file was written under. Nullable.

Examples

The following code example summarizes an Iceberg table’s storage by file content type and format, including data and delete files. It reports file and manifest counts, record and byte totals, average file size, and the number of distinct partition values:

SELECT
    content,
    file_format,
    count(DISTINCT manifest_path)          AS manifests,
    count(*)                               AS num_files,
    count(DISTINCT partition_values)       AS distinct_partition_values,
    sum(record_count)                      AS total_records,
    sum(file_size)                         AS total_bytes,
    round(sum(file_size) / 1000000.0, 2)   AS total_mb,
    round(avg(file_size) / 1000000.0, 2)   AS avg_file_mb,
    round(avg(record_count), 1)            AS avg_records_per_file
FROM LIST_ICEBERG_FILES(
  URL => 's3://firebolt-core-us-east-1/test_data/tpch/iceberg/tpch.db/lineitem'
)
GROUP BY content, file_format
ORDER BY content, file_format;

Limitations

The same catalog, file-format, and data-type support as READ_ICEBERG applies.
deleted_positions is a serialized Roaring bitmap. Decode it in application code rather than interpreting the raw bytes in SQL; use length(deleted_positions) > 0 to test whether a file has any deletes.

​Syntax

​Parameters

​File selection parameters

​Return Type

​Examples

​Limitations