Skip to main content
A table-valued function (TVF) that reads data from Parquet files stored in Amazon S3. The function can use either a location object (recommended) or direct credentials to access the data. READ_PARQUET returns a table with data from the specified Parquet file.

Syntax

-- Using location object (recommended)
READ_PARQUET (
  LOCATION => location_name
  [, PATTERN => <pattern>]
  [, ESTIMATED_ROWS => <estimated_rows>]
)

-- Using static credentials
READ_PARQUET (
  URL => <file_url>
  [, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
  [, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
  [, AWS_SESSION_TOKEN => <aws_session_token>]
  [, AWS_ROLE_ARN => <aws_role_arn>]
  [, AWS_ROLE_EXTERNAL_ID => <aws_role_external_id>]
  [, ESTIMATED_ROWS => <estimated_rows>]
)

Parameters

ParameterDescriptionSupported input types
LOCATIONThe name of a location object that contains the Amazon S3 URL and credentials. Firebolt recommends using LOCATION to store credentials for authentication. LOCATION must be specified as a string literal (e.g., LOCATION => 'my_location'). Unlike URL, it cannot be used as a positional parameter. For a comprehensive guide, see LOCATION objects.TEXT
PATTERNWhen using LOCATION, an optional glob pattern to filter files within the location’s URL path. The pattern is applied relative to the location’s base path. For example, PATTERN => 'week_1/*.parquet' will match all .parquet files in the week_1 subdirectory.TEXT
URLThe location of the Amazon S3 bucket containing your files. The expected format is s3://{bucket_name}/{full_file_path_glob_pattern}.TEXT
AWS_ACCESS_KEY_IDThe AWS access key ID.TEXT
AWS_SECRET_ACCESS_KEYThe AWS secret access key.TEXT
AWS_SESSION_TOKENThe AWS session token.TEXT
AWS_ROLE_ARNThe AWS role ARN.TEXT
AWS_ROLE_EXTERNAL_IDThe AWS role external ID.TEXT
ESTIMATED_ROWSHints the estimated number of rows returned by READ_PARQUET to query planning for improved join ordering.INT, BIGINT
  • When using static credentials:
    • The URL can be passed as either the first positional parameter or a named parameter
    • If you provide either AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY, you must provide both
    • Providing an AWS session token is optional
    • Credentials are not required for accessing public buckets

Return Type

The result is a table with data from the Parquet files. Columns are read and parsed using their inferred data types.

Best practice

Firebolt recommends using a LOCATION object to store credentials for authentication. When using READ_PARQUET(), the URL parameter in the location should contain only parquet files (see location table-valued functions).

Examples

Example The following code example reads the first 5 rows from a Parquet file using a LOCATION object to store credentials for authentication:
SELECT * 
FROM READ_PARQUET(
    LOCATION => 'my_parquet_location'
) 
LIMIT 5;
Example: Using location object with pattern This example shows how to use the PATTERN parameter with a location object to filter specific files:
CREATE LOCATION firebolt_sample_dataset WITH
  SOURCE = AMAZON_S3
  URL = 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/';


SELECT $source_file_name, "CurrentSpeed", "CurrentPlayTime"
FROM READ_PARQUET(
    LOCATION => 'firebolt_sample_dataset', PATTERN => 'playstats/TournamentID=92/*.parquet'
) limit 5;
Returns
$source_file_nameCurrentSpeedCurrentPlayTime
help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet108,060.488
help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet458,061.4752
help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet2808,062.4624
help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet2308,063.4496
help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet3338,064.4368
This reads only the Parquet files matching the pattern playstats/TournamentID=92/*.parquet within the location’s base path, showing specific columns and the source file for verification. Example The following code example reads the first 5 rows from a Parquet file using static credentials for authentication:
SELECT * 
FROM READ_PARQUET(
    URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet'
) 
LIMIT 5;
Returns
GameIDPlayerIDTimestampSelectedCarCurrentLevelCurrentSpeedCurrentPlayTimeCurrentScoreEventErrorCode
18452022-10-27 13:36:33Solara1000BrakeNoError
18452022-10-27 13:36:33Solara13390.98722RightTurnGraphicsFreeze
18452022-10-27 13:36:34Solara12881.974420TiltNoError
18452022-10-27 13:36:35Solara12602.961653BlockTextNotFound
18452022-10-27 13:36:36Solara11963.948881FullSpeedNoError

Using URL

  • The URL can be passed as either the first positional parameter or a named parameter. For example, the following two queries will both read the same file:
  • Credentials are optional.
SELECT * FROM READ_PARQUET('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet');
SELECT * FROM READ_PARQUET(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet');
  • The url can represent a single file or a glob pattern. If a glob pattern is used, all files matching the pattern will be read. A special column $source_file_name can be used to identify the source file of each row in the result set:
SELECT *, $source_file_name FROM READ_PARQUET('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/*.parquet') LIMIT 5;
When using glob patterns, the wildcard (*) can only be used at the end of the path. You can use it with any text before or after, such as *.parquet, date=2025*.parquet, or data_*.parquet. The pattern will recursively match files in all subdirectories. For example:
SELECT count(*) FROM READ_PARQUET('s3://firebolt-publishing-public/*.parquet');
counts rows of all Parquet files in the bucket, including those in subdirectories like help_center_assets/firebolt_sample_dataset/playstats/*.parquet. The column $source_file_name can be used in combination with REGEXP_EXTRACT to extract data from the source file path. The following code returns the TournamentID for each record it reads based on its file path:
SELECT REGEXP_EXTRACT($source_file_name, 'TournamentID=(\d+)','',1) as TournamentID, *
FROM READ_PARQUET(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/*.parquet')
LIMIT 5;
For example, it returns 92 for records in file s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet.
⌘I