READ_PARQUET
Reference material for READ_PARQUET function
A table-valued function (TVF) that reads data from Parquet files stored in Amazon S3. The function can use either a location object (recommended) or direct credentials to access the data. READ_PARQUET
returns a table with data from the specified Parquet file.
Syntax
Parameters
Parameter | Description | Supported input types |
---|---|---|
LOCATION | The name of a location object that contains the Amazon S3 URL and credentials. Firebolt recommends using LOCATION to store credentials for authentication. For a comprehensive guide, see LOCATION objects. For syntax details, see CREATE LOCATION. | IDENTIFIER |
URL | The location of the Amazon S3 bucket containing your files. The expected format is s3://{bucket_name}/{full_file_path_glob_pattern} . | TEXT |
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
- When using static credentials:
- The
URL
can be passed as either the first positional parameter or a named parameter - If you provide either
AWS_ACCESS_KEY_ID
orAWS_SECRET_ACCESS_KEY
, you must provide both - Providing an AWS session token is optional
- Credentials are not required for accessing public buckets
- The
Return Type
The result is a table with data from the Parquet files. Columns are read and parsed using their inferred data types.
Best practice
Firebolt recommends using a LOCATION
object to store credentials for authentication.
Examples
Example
The following code example reads the first 5 rows from a Parquet file using a LOCATION
object to store credentials for authentication:
Example
The following code example reads the first 5 rows from a Parquet file using static credentials for authentication:
Returns
GameID | PlayerID | Timestamp | SelectedCar | CurrentLevel | CurrentSpeed | CurrentPlayTime | CurrentScore | Event | ErrorCode |
---|---|---|---|---|---|---|---|---|---|
1 | 845 | 2022-10-27 13:36:33 | Solara | 1 | 0 | 0 | 0 | Brake | NoError |
1 | 845 | 2022-10-27 13:36:33 | Solara | 1 | 339 | 0.9872 | 2 | RightTurn | GraphicsFreeze |
1 | 845 | 2022-10-27 13:36:34 | Solara | 1 | 288 | 1.9744 | 20 | Tilt | NoError |
1 | 845 | 2022-10-27 13:36:35 | Solara | 1 | 260 | 2.9616 | 53 | Block | TextNotFound |
1 | 845 | 2022-10-27 13:36:36 | Solara | 1 | 196 | 3.9488 | 81 | FullSpeed | NoError |
Using URL
- The
URL
can be passed as either the first positional parameter or a named parameter. For example, the following two queries will both read the same file: - Credentials are optional.
- The
url
can represent a single file or a glob pattern. If a glob pattern is used, all files matching the pattern will be read. A special column$source_file_name
can be used to identify the source file of each row in the result set:
When using glob patterns, the wildcard (*
) can only be used at the end of the path. You can use it with any text before or after, such as *.parquet
, date=2025*.parquet
, or data_*.parquet
.
The pattern will recursively match files in all subdirectories. For example:
will read all PARQUET files in the bucket, including those in subdirectories like help_center_assets/firebolt_sample_dataset/playstats/*.parquet
.