READ_PARQUET
returns a table with data from the specified Parquet file.
Syntax
Parameters
Parameter | Description | Supported input types |
---|---|---|
LOCATION | The name of a location object that contains the Amazon S3 URL and credentials. Firebolt recommends using LOCATION to store credentials for authentication. LOCATION must be specified as a string literal (e.g., LOCATION => 'my_location' ). Unlike URL , it cannot be used as a positional parameter. For a comprehensive guide, see LOCATION objects. | TEXT |
URL | The location of the Amazon S3 bucket containing your files. The expected format is s3://{bucket_name}/{full_file_path_glob_pattern} . | TEXT |
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
ESTIMATED_ROWS | Hints the estimated number of rows returned by READ_PARQUET to query planning for improved join ordering. | INT , BIGINT |
- When using static credentials:
- The
URL
can be passed as either the first positional parameter or a named parameter - If you provide either
AWS_ACCESS_KEY_ID
orAWS_SECRET_ACCESS_KEY
, you must provide both - Providing an AWS session token is optional
- Credentials are not required for accessing public buckets
- The
Return Type
The result is a table with data from the Parquet files. Columns are read and parsed using their inferred data types.Best practice
Firebolt recommends using aLOCATION
object to store credentials for authentication.
When using READ_PARQUET()
, the URL parameter in the location should contain only parquet files (see location table-valued functions).
Examples
Example The following code example reads the first 5 rows from a Parquet file using aLOCATION
object to store credentials for authentication:
GameID | PlayerID | Timestamp | SelectedCar | CurrentLevel | CurrentSpeed | CurrentPlayTime | CurrentScore | Event | ErrorCode |
---|---|---|---|---|---|---|---|---|---|
1 | 845 | 2022-10-27 13:36:33 | Solara | 1 | 0 | 0 | 0 | Brake | NoError |
1 | 845 | 2022-10-27 13:36:33 | Solara | 1 | 339 | 0.9872 | 2 | RightTurn | GraphicsFreeze |
1 | 845 | 2022-10-27 13:36:34 | Solara | 1 | 288 | 1.9744 | 20 | Tilt | NoError |
1 | 845 | 2022-10-27 13:36:35 | Solara | 1 | 260 | 2.9616 | 53 | Block | TextNotFound |
1 | 845 | 2022-10-27 13:36:36 | Solara | 1 | 196 | 3.9488 | 81 | FullSpeed | NoError |
Using URL
- The
URL
can be passed as either the first positional parameter or a named parameter. For example, the following two queries will both read the same file: - Credentials are optional.
- The
url
can represent a single file or a glob pattern. If a glob pattern is used, all files matching the pattern will be read. A special column$source_file_name
can be used to identify the source file of each row in the result set:
*
) can only be used at the end of the path. You can use it with any text before or after, such as *.parquet
, date=2025*.parquet
, or data_*.parquet
.
The pattern will recursively match files in all subdirectories. For example:
help_center_assets/firebolt_sample_dataset/playstats/*.parquet
.
The column $source_file_name
can be used in combination with REGEXP_EXTRACT to extract data from the source file path. The following code returns the TournamentID
for each record it reads based on its file path:
92
for records in file s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet
.