> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

> Reference material for READ_PARQUET function

# READ_PARQUET

A table-valued function (TVF) that reads data from Parquet files stored in Amazon S3. The function can use either a location object (recommended) or direct credentials to access the data. `READ_PARQUET` returns a table with data from the specified Parquet file.

## Syntax

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
-- Using location object (recommended)
READ_PARQUET (
  LOCATION => location_name
  [, PATTERN => <pattern>]
  [, REPLACE_NON_UTF8_BYTES => { TRUE | FALSE }]
  [, ESTIMATED_ROWS => <estimated_rows>]
  [, PARSE_JSON_AS => { 'JSON' | 'TEXT' }]
)

-- Using static credentials
READ_PARQUET (
  URL => <file_url>
  [, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
  [, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
  [, AWS_SESSION_TOKEN => <aws_session_token>]
  [, AWS_ROLE_ARN => <aws_role_arn>]
  [, AWS_ROLE_EXTERNAL_ID => <aws_role_external_id>]
  [, REPLACE_NON_UTF8_BYTES => { TRUE | FALSE }]
  [, ESTIMATED_ROWS => <estimated_rows>]
  [, PARSE_JSON_AS => { 'JSON' | 'TEXT' }]
)
```

## Parameters

| Parameter                | Description                                                                                                                                                                                                                                                                                                                                                                                 | Supported input types |
| :----------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------------------- |
| `LOCATION`               | The name of a location object that contains the Amazon S3 URL and credentials. Firebolt recommends using `LOCATION` to store credentials for authentication. `LOCATION` must be specified as a string literal (e.g., `LOCATION => 'my_location'`). Unlike `URL`, it cannot be used as a positional parameter. For a comprehensive guide, see [LOCATION objects](/guides/security/location). | `TEXT`                |
| `PATTERN`                | When using `LOCATION`, an optional glob pattern to filter files within the location's URL path. The pattern is applied relative to the location's base path. For example, `PATTERN => 'week_1/*.parquet'` will match all `.parquet` files in the `week_1` subdirectory.                                                                                                                     | `TEXT`                |
| `URL`                    | The location of the Amazon S3 bucket containing your files. The expected format is `s3://{bucket_name}/{full_file_path_glob_pattern}`.                                                                                                                                                                                                                                                      | `TEXT`                |
| `AWS_ACCESS_KEY_ID`      | The AWS access key ID.                                                                                                                                                                                                                                                                                                                                                                      | `TEXT`                |
| `AWS_SECRET_ACCESS_KEY`  | The AWS secret access key.                                                                                                                                                                                                                                                                                                                                                                  | `TEXT`                |
| `AWS_SESSION_TOKEN`      | The AWS session token.                                                                                                                                                                                                                                                                                                                                                                      | `TEXT`                |
| `AWS_ROLE_ARN`           | The AWS role ARN.                                                                                                                                                                                                                                                                                                                                                                           | `TEXT`                |
| `AWS_ROLE_EXTERNAL_ID`   | The AWS role external ID.                                                                                                                                                                                                                                                                                                                                                                   | `TEXT`                |
| `REPLACE_NON_UTF8_BYTES` | Whether to replace non-UTF8 bytes in string columns with the Unicode replacement character. Since Firebolt's [TEXT data type](/reference-sql/data-types#text) requires all values to be valid UTF8, non-UTF8 bytes in Parquet string columns are replaced with the Unicode replacement character (�, U+FFFD) when using this option. Otherwise, an error is raised. Default: false.         | `BOOLEAN`             |
| `ESTIMATED_ROWS`         | Hints the estimated number of rows returned by `READ_PARQUET` to query planning for improved join ordering.                                                                                                                                                                                                                                                                                 | `INT`, `BIGINT`       |
| `PARSE_JSON_AS`          | Specifies the Firebolt data type which should be inferred for source columns of the native Parquet JSON type. Can be either `'TEXT'` or `'JSON'`, with the default being `'JSON'`.                                                                                                                                                                                                          | `TEXT`                |

* When using static credentials:
  * The `URL` can be passed as either the first positional parameter or a named parameter
  * If you provide either `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY`, you must provide both
  * Providing an AWS session token is optional
  * Credentials are not required for accessing public buckets

## Return Type

The result is a table with data from the Parquet files. Columns are read and parsed using their inferred data types.

<Note>
  **When loading multiple files, Firebolt infers the schema from the most recently modified file.** The remaining files must have compatible data types. If types vary between files (e.g., a column contains integers in one file but doubles in another, or is numeric in one file but text in another), the inferred schema may not match all files and thus cause data type errors or query failures. In such cases, we recommend defining an explicit schema using either [external tables](/reference-sql/commands/data-definition/create-external-table) or [`COPY FROM`](/reference-sql/commands/data-management/copy-from) into existing tables.
</Note>

## Best practice

Firebolt recommends using a `LOCATION` object to store credentials for authentication.

When using `READ_PARQUET()`, the URL parameter in the location should contain only parquet files (see [location table-valued functions](https://docs.firebolt.io/guides/security/location#table-valued-functions-tvfs)).

## Examples

**Example**

The following code example reads the first 5 rows from a Parquet file using a `LOCATION` object to store credentials for authentication:

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
SELECT * 
FROM READ_PARQUET(
    LOCATION => 'my_parquet_location'
) 
LIMIT 5;
```

**Example: Using location object with pattern**

This example shows how to use the `PATTERN` parameter with a location object to filter specific files:

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
CREATE LOCATION firebolt_sample_dataset WITH
  SOURCE = AMAZON_S3
  URL = 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/';


SELECT $source_file_name, "CurrentSpeed", "CurrentPlayTime"
FROM READ_PARQUET(
    LOCATION => 'firebolt_sample_dataset', PATTERN => 'playstats/TournamentID=92/*.parquet'
) limit 5;
```

**Returns**

| \$source\_file\_name                                                                                                | CurrentSpeed | CurrentPlayTime |
| :------------------------------------------------------------------------------------------------------------------ | :----------- | :-------------- |
| help\_center\_assets/firebolt\_sample\_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet | 10           | 8,060.488       |
| help\_center\_assets/firebolt\_sample\_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet | 45           | 8,061.4752      |
| help\_center\_assets/firebolt\_sample\_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet | 280          | 8,062.4624      |
| help\_center\_assets/firebolt\_sample\_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet | 230          | 8,063.4496      |
| help\_center\_assets/firebolt\_sample\_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet | 333          | 8,064.4368      |

This reads only the Parquet files matching the pattern `playstats/TournamentID=92/*.parquet` within the location's base path, showing specific columns and the source file for verification.

**Example**

The following code example reads the first 5 rows from a Parquet file using static credentials for authentication:

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
SELECT * 
FROM READ_PARQUET(
    URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet'
) 
LIMIT 5;
```

**Returns**

| GameID | PlayerID | Timestamp           | SelectedCar | CurrentLevel | CurrentSpeed | CurrentPlayTime | CurrentScore | Event     | ErrorCode      |
| :----- | :------- | :------------------ | :---------- | :----------- | :----------- | :-------------- | :----------- | :-------- | :------------- |
| 1      | 845      | 2022-10-27 13:36:33 | Solara      | 1            | 0            | 0               | 0            | Brake     | NoError        |
| 1      | 845      | 2022-10-27 13:36:33 | Solara      | 1            | 339          | 0.9872          | 2            | RightTurn | GraphicsFreeze |
| 1      | 845      | 2022-10-27 13:36:34 | Solara      | 1            | 288          | 1.9744          | 20           | Tilt      | NoError        |
| 1      | 845      | 2022-10-27 13:36:35 | Solara      | 1            | 260          | 2.9616          | 53           | Block     | TextNotFound   |
| 1      | 845      | 2022-10-27 13:36:36 | Solara      | 1            | 196          | 3.9488          | 81           | FullSpeed | NoError        |

### Using URL

* The `URL` can be passed as either the first positional parameter or a named parameter. For example, the following two queries will both read the same file:
* Credentials are optional.

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
SELECT * FROM READ_PARQUET('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet');
SELECT * FROM READ_PARQUET(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet');
```

* The `url` can represent a single file or a [glob](https://en.wikipedia.org/wiki/Glob_\(programming\)) pattern. If a glob pattern is used, all files matching the pattern will be read. A special column `$source_file_name` can be used to identify the source file of each row in the result set:

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
SELECT *, $source_file_name FROM READ_PARQUET('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/*.parquet') LIMIT 5;
```

When using glob patterns, the wildcard (`*`) can only be used at the end of the path. You can use it with any text before or after, such as `*.parquet`, `date=2025*.parquet`, or `data_*.parquet`.

The pattern will recursively match files in all subdirectories. For example:

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
SELECT count(*) FROM READ_PARQUET('s3://firebolt-publishing-public/*.parquet');
```

counts rows of all Parquet files in the bucket, including those in subdirectories like `help_center_assets/firebolt_sample_dataset/playstats/*.parquet`.

The column `$source_file_name` can be used in combination with [REGEXP\_EXTRACT](https://docs.firebolt.io/reference-sql/functions-reference/string/regexp-extract) to extract data from the source file path. The following code returns the `TournamentID` for each record it reads based on its file path:

```sql theme={"theme":{"light":"github-light","dark":"github-dark"}}
SELECT REGEXP_EXTRACT($source_file_name, 'TournamentID=(\d+)','',1) as TournamentID, *
FROM READ_PARQUET(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/*.parquet')
LIMIT 5;
```

For example, it returns `92` for records in file `s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/TournamentID=92/cc2a2a0b4e8b4fb39abf20a956e7cc3e-0.parquet`.
