READ_PARQUET, which returns the data itself.
Syntax
PARQUET_METADATA accepts the same LOCATION / URL access and authentication parameters as READ_PARQUET, plus the parameter below. It does not accept the data-parsing parameters of READ_PARQUET (SCHEMA, REPLACE_NON_UTF8_BYTES, ESTIMATED_ROWS, PARSE_JSON_AS).
Parameters
| Parameter | Description | Supported input types |
|---|---|---|
SHOW_PAGES_STATS | When TRUE, adds per-page columns to the result, returning one row per page within each column chunk instead of one row per column chunk. Default: FALSE. | BOOLEAN |
LOCATION, PATTERN, URL, and AWS_*), see READ_PARQUET parameters.
Return Type
The result is a table whose grain depends onSHOW_PAGES_STATS. With the default FALSE, there is one row per column chunk (one row group’s worth of a single column). With TRUE, there is one row per page, and the per-page columns below are appended.
Base columns:
| Column | Type | Description |
|---|---|---|
file_name | TEXT | Path of the source Parquet file. |
row_group_id | BIGINT | Index of the row group within the file. |
row_group_num_rows | BIGINT | Number of rows in the row group. |
row_group_num_columns | BIGINT | Number of columns in the row group. |
row_group_bytes | BIGINT | Total byte size of the row group. |
column_id | BIGINT | Index of the column within the row group. |
file_offset | BIGINT | Byte offset of the column chunk within the file. Nullable. |
num_values | BIGINT | Number of values in the column chunk. |
path_in_schema | ARRAY(TEXT) | Path of the column in the Parquet schema, as a sequence of nested field names. |
type | TEXT | Physical Parquet type of the column. |
stats_null_count | BIGINT | Number of null values recorded in the column-chunk statistics. Nullable. |
stats_distinct_count | BIGINT | Number of distinct values recorded in the column-chunk statistics. Nullable. |
stats_min_value | TEXT | Minimum value recorded in the column-chunk statistics, as text. Nullable. |
stats_max_value | TEXT | Maximum value recorded in the column-chunk statistics, as text. Nullable. |
compression | TEXT | Compression codec of the column chunk. |
encodings | ARRAY(TEXT) | Encodings used in the column chunk. |
dictionary_page_offset | BIGINT | Byte offset of the dictionary page, when present. Nullable. |
data_page_offset | BIGINT | Byte offset of the first data page. |
total_compressed_size | BIGINT | Compressed size of the column chunk in bytes. |
total_uncompressed_size | BIGINT | Uncompressed size of the column chunk in bytes. |
SHOW_PAGES_STATS => TRUE:
| Column | Type | Description |
|---|---|---|
page_type | TEXT | Type of the page (for example data page or dictionary page). |
page_encoding | TEXT | Encoding of the page. |
page_compressed_size | BIGINT | Compressed size of the page in bytes. |
page_uncompressed_size | BIGINT | Uncompressed size of the page in bytes. |
page_num_values | BIGINT | Number of values in the page. |
page_num_nulls | BIGINT | Number of null values in the page. Nullable. |
page_num_rows | BIGINT | Number of rows in the page. Nullable. |
Examples
The following code example reads the metadata of a Parquet file withSHOW_PAGES_STATS => TRUE, returning one row per page with the full base and page-level columns: