Skip to main content
A table-valued function (TVF) that lists the data files of an Apache Iceberg table along with the file-level metadata Firebolt reads from the table’s manifests. Like READ_ICEBERG, it accesses the table through either a location object or individual TVF parameters, and supports file-based catalogs as well as REST catalogs. Unlike READ_ICEBERG, it does not read the table’s rows; it returns one row per file, including the table’s delete files, with the per-file statistics, partition values, and deleted-row positions recorded in the Iceberg metadata. Use it to inspect file layout, partitioning, file sizes, per-column min/max bounds, and deletes without scanning the underlying data, for example when debugging pruning, planning a vacuum, or auditing snapshot contents.

Syntax

LIST_ICEBERG_FILES accepts the same LOCATION / URL access and authentication parameters as READ_ICEBERG, plus the file-selection parameters below. It does not accept MAX_STALENESS.
LIST_ICEBERG_FILES (
  -- Access and authentication: same as READ_ICEBERG (LOCATION, or URL plus the
  -- relevant WAREHOUSE / NAMESPACE / TABLE, AWS_*, and OAUTH_* / BEARER_TOKEN params).
  { LOCATION => '<location_name>' | URL => '<s3_or_rest_api_url>' }
  [, ... ]
  -- File selection
  [, SNAPSHOT_ID => <snapshot_id> ]
  [, SNAPSHOT_TIMESTAMP => <epoch_milliseconds> ]
  [, ADDED_FILES_SINCE_SNAPSHOT_ID => <snapshot_id> ]
)

Parameters

File selection parameters

ParameterDescriptionSupported input types
SNAPSHOT_IDLists the files of a specific snapshot by its Iceberg snapshot ID. When omitted, the current snapshot is used.BIGINT
SNAPSHOT_TIMESTAMPLists the files of the snapshot that was current at the given time, expressed as epoch milliseconds.BIGINT
ADDED_FILES_SINCE_SNAPSHOT_IDLists only the files added since the given snapshot ID, rather than all files in the selected snapshot. Use this to inspect what an incremental change introduced.BIGINT

Return Type

The result is a table with one row per file. Both data files and the table’s row-level delete files are listed as rows. The schema has three parts:
  • Fixed columns describing each file and its manifest entry.
  • Per-column statistics: a min_<column> and max_<column> pair for every column of the table, holding the lower and upper bound recorded for that column in the file’s manifest entry. Each bound has the same data type as its source column and is nullable.
  • Per-partition values: a partition_value_<partition_column>_<field_id> column for every partition field in the table’s partition specs, holding that file’s typed partition value.
Fixed columns:
ColumnTypeDescription
manifest_pathTEXTPath to the manifest file that this entry was read from.
manifest_sequence_numberBIGINTSequence number of the manifest.
manifest_contentTEXTContent type of the manifest (data or deletes).
statusTEXTManifest-entry status of the file (for example added, existing, or deleted) within the selected snapshot.
contentTEXTContent type of the file: a data file, position-delete file, or equality-delete file.
file_pathTEXTFull path of the file in object storage.
file_formatTEXTOn-disk format of the file (for example PARQUET).
record_countBIGINTNumber of records the file contains.
file_sizeBIGINTSize of the file in bytes.
partition_columnsARRAY(TEXT)Names of the partition columns that apply to the file.
partition_transformsARRAY(TEXT)Partition transforms applied to those columns (for example day, bucket[16]).
partition_valuesARRAY(TEXT)The file’s partition values, as text, aligned with partition_columns.
file_nameTEXTFile name component of file_path.
file_bucketTEXTBucket component of file_path.
file_storage_apiTEXTStorage API used to access the file.
file_etagTEXTETag of the file, when available.
deleted_positionsBYTEASerialized Roaring bitmap of the row positions deleted from this data file by the delete files that apply to the selected snapshot. Populated on data-file rows; empty when no rows are deleted, and empty on delete-file rows.
partition_spec_idINTEGERID of the partition spec the file was written under. Nullable.

Examples

The following code example summarizes an Iceberg table’s storage by file content type and format, including data and delete files. It reports file and manifest counts, record and byte totals, average file size, and the number of distinct partition values:
SELECT
    content,
    file_format,
    count(DISTINCT manifest_path)          AS manifests,
    count(*)                               AS num_files,
    count(DISTINCT partition_values)       AS distinct_partition_values,
    sum(record_count)                      AS total_records,
    sum(file_size)                         AS total_bytes,
    round(sum(file_size) / 1000000.0, 2)   AS total_mb,
    round(avg(file_size) / 1000000.0, 2)   AS avg_file_mb,
    round(avg(record_count), 1)            AS avg_records_per_file
FROM LIST_ICEBERG_FILES(
  URL => 's3://firebolt-core-us-east-1/test_data/tpch/iceberg/tpch.db/lineitem'
)
GROUP BY content, file_format
ORDER BY content, file_format;

Limitations

  • The same catalog, file-format, and data-type support as READ_ICEBERG applies.
  • deleted_positions is a serialized Roaring bitmap. Decode it in application code rather than interpreting the raw bytes in SQL; use length(deleted_positions) > 0 to test whether a file has any deletes.