Skip to main content
A table-valued function (TVF) that returns the internal metadata of Parquet files stored in Amazon S3 without reading their data. The function can use either a location object (recommended) or direct credentials to access the files. For each file it reports the file’s row groups and, within each row group, every column chunk: its encoding, compression, byte sizes, value counts, and column statistics. Optionally it also reports per-page metadata. Use it to understand file layout and decide how to tune a workload: row-group sizing, compression and encoding choices, the column statistics available for pruning, and dictionary usage. It complements READ_PARQUET, which returns the data itself.

Syntax

PARQUET_METADATA accepts the same LOCATION / URL access and authentication parameters as READ_PARQUET, plus the parameter below. It does not accept the data-parsing parameters of READ_PARQUET (SCHEMA, REPLACE_NON_UTF8_BYTES, ESTIMATED_ROWS, PARSE_JSON_AS).
PARQUET_METADATA (
  -- Access and authentication: same as READ_PARQUET (LOCATION [, PATTERN], or URL
  -- plus the relevant AWS_* params).
  { LOCATION => '<location_name>' | URL => '<file_url>' }
  [, ... ]
  [, SHOW_PAGES_STATS => { TRUE | FALSE } ]
)

Parameters

ParameterDescriptionSupported input types
SHOW_PAGES_STATSWhen TRUE, adds per-page columns to the result, returning one row per page within each column chunk instead of one row per column chunk. Default: FALSE.BOOLEAN
For the access and authentication parameters (LOCATION, PATTERN, URL, and AWS_*), see READ_PARQUET parameters.

Return Type

The result is a table whose grain depends on SHOW_PAGES_STATS. With the default FALSE, there is one row per column chunk (one row group’s worth of a single column). With TRUE, there is one row per page, and the per-page columns below are appended. Base columns:
ColumnTypeDescription
file_nameTEXTPath of the source Parquet file.
row_group_idBIGINTIndex of the row group within the file.
row_group_num_rowsBIGINTNumber of rows in the row group.
row_group_num_columnsBIGINTNumber of columns in the row group.
row_group_bytesBIGINTTotal byte size of the row group.
column_idBIGINTIndex of the column within the row group.
file_offsetBIGINTByte offset of the column chunk within the file. Nullable.
num_valuesBIGINTNumber of values in the column chunk.
path_in_schemaARRAY(TEXT)Path of the column in the Parquet schema, as a sequence of nested field names.
typeTEXTPhysical Parquet type of the column.
stats_null_countBIGINTNumber of null values recorded in the column-chunk statistics. Nullable.
stats_distinct_countBIGINTNumber of distinct values recorded in the column-chunk statistics. Nullable.
stats_min_valueTEXTMinimum value recorded in the column-chunk statistics, as text. Nullable.
stats_max_valueTEXTMaximum value recorded in the column-chunk statistics, as text. Nullable.
compressionTEXTCompression codec of the column chunk.
encodingsARRAY(TEXT)Encodings used in the column chunk.
dictionary_page_offsetBIGINTByte offset of the dictionary page, when present. Nullable.
data_page_offsetBIGINTByte offset of the first data page.
total_compressed_sizeBIGINTCompressed size of the column chunk in bytes.
total_uncompressed_sizeBIGINTUncompressed size of the column chunk in bytes.
Additional columns when SHOW_PAGES_STATS => TRUE:
ColumnTypeDescription
page_typeTEXTType of the page (for example data page or dictionary page).
page_encodingTEXTEncoding of the page.
page_compressed_sizeBIGINTCompressed size of the page in bytes.
page_uncompressed_sizeBIGINTUncompressed size of the page in bytes.
page_num_valuesBIGINTNumber of values in the page.
page_num_nullsBIGINTNumber of null values in the page. Nullable.
page_num_rowsBIGINTNumber of rows in the page. Nullable.

Examples

The following code example reads the metadata of a Parquet file with SHOW_PAGES_STATS => TRUE, returning one row per page with the full base and page-level columns:
SELECT *
FROM PARQUET_METADATA(
  URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet',
  SHOW_PAGES_STATS => TRUE
);