A table-valued function (TVF) that reads JSON files from object storage and returns their contents as a table, so you can query files in place — no external table or COPY FROM required. Access the data either through a location object (recommended, and the way to read from sources other than Amazon S3) or through a direct Amazon S3 URL with credentials.
READ_JSON has two modes:
- Schema inference (default) — the column names and types are inferred from the data. Nested objects become STRUCT columns and arrays become
ARRAY columns.
PARSE_AS_JSON => TRUE — no inference; every JSON document becomes one row with a single column named json of type JSON. Use this for schemaless, polymorphic, or deeply irregular data, and as the escape hatch whenever schema inference fails.
Syntax
-- Using location object (recommended)
READ_JSON (
LOCATION => 'location_name'
[, PATTERN => <pattern>]
[, SCHEMA => '<column_definitions>']
[, PARSE_AS_JSON => { TRUE | FALSE }]
[, COMPRESSION => <compression_type>]
)
-- Using static credentials
READ_JSON (
URL => <file_url>
[, SCHEMA => '<column_definitions>']
[, PARSE_AS_JSON => { TRUE | FALSE }]
[, COMPRESSION => <compression_type>]
[, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
[, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
[, AWS_SESSION_TOKEN => <aws_session_token>]
[, AWS_ROLE_ARN => <aws_role_arn>]
[, AWS_ROLE_EXTERNAL_ID => <aws_role_external_id>]
)
Parameters
| Parameter | Description | Supported input types |
|---|
LOCATION | The name of a location object that holds the URL and credentials for the source (Amazon S3, Google Cloud Storage, and other supported sources). Firebolt recommends using LOCATION to store credentials for authentication. LOCATION must be a string literal (e.g. LOCATION => 'my_location'); unlike URL it cannot be passed positionally. For a comprehensive guide, see LOCATION objects. | TEXT |
PATTERN | When using LOCATION, an optional glob pattern to filter files within the location’s URL path. The pattern is applied relative to the location’s base path. Defaults to *. | TEXT |
URL | The Amazon S3 location of your files, as a constant string. The expected format is s3://{bucket_name}/{path}, where the path may include a glob pattern (e.g. s3://my_bucket/events/*.json). To read from other sources, use LOCATION instead. | TEXT |
SCHEMA | An explicit column schema, written as a comma-separated CREATE TABLE column list (no enclosing parentheses), for example SCHEMA => 'id BIGINT, payload STRUCT(a INT, b TEXT)'. When provided, schema inference is skipped and each JSON document is read into the declared columns; nested objects can be declared as STRUCT columns and JSON arrays as ARRAY columns. Mutually exclusive with PARSE_AS_JSON. | TEXT |
PARSE_AS_JSON | When TRUE, schema inference is skipped and every JSON document is returned as one row with a single column named json of type JSON. Default: FALSE. | BOOLEAN |
COMPRESSION | The compression of the files: 'gzip', 'snappy', 'lz4', 'zstd', or 'brotli'. By default, the compression is detected from the file extension (for example .json.gz). | TEXT |
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
- When using static credentials, the
URL can be passed as either the first positional parameter or a named parameter, and credentials are not required for accessing public buckets.
Accepted file contents
Each file may contain:
- Newline-delimited JSON (NDJSON / JSON Lines) — one document per line. This is the recommended format.
- A single JSON document, including pretty-printed documents spanning multiple lines, which yields exactly one row.
- One top-level JSON array of objects (
[{...}, {...}]) — every array element becomes a row.
Every document must be a JSON object. Files containing bare scalars or arrays of scalars are rejected.
Schema inference
The schema is inferred from the most recently modified file matching the URL or pattern. That file is read in full, and every document in it contributes to the schema:
- The column set is the union of all keys across the documents. A key missing from a document yields
NULL in that row.
- Nested objects become
STRUCT columns (recursively); arrays become ARRAY columns, including arrays of structs.
- Other matching files do not influence the schema. When they are read, their unknown keys are ignored and missing keys yield
NULL. These files are decoded tolerantly, matching JSON external tables: some non-conformant constructs (for example unquoted boolean words or trailing commas in arrays) are accepted rather than rejected.
| JSON value | Inferred type |
|---|
true / false | BOOLEAN |
| integer | BIGINT |
| fractional number | DOUBLE PRECISION |
| string | TEXT |
| string in ISO-8601 timestamp format (all values of the key) | TIMESTAMP |
| array | ARRAY of the unified element type |
| object | STRUCT |
only null observed | TEXT |
All inferred columns are nullable. Integers and fractional numbers in the same key unify to DOUBLE PRECISION. A JSON null always becomes SQL NULL.
When inference fails, use PARSE_AS_JSON => TRUE. Schema inference fails with an actionable error when a key has irreconcilable types across documents (for example a number in one document and a string in another), when a document contains duplicate keys, or when the schema would exceed 500 keys (nested keys included — a guard against map-like documents with unique keys exploding into thousands of columns). In all of these cases, PARSE_AS_JSON => TRUE reads the same files without restrictions, returning each document as a single JSON value.
Return Type
In the default mode, the result is a table whose columns follow the inferred schema. With PARSE_AS_JSON => TRUE, the result is a table with a single nullable column json of type JSON, one row per document; every document is validated as well-formed JSON.
Like the other read functions, READ_JSON exposes the $source_file_name and $source_file_timestamp pseudo columns identifying the file each row came from.
Examples
Example
Read events with an inferred schema, including nested structs, and access struct fields:
SELECT id, payload.user_name, tags
FROM READ_JSON('s3://my_bucket/events/*.json.gz')
WHERE payload.status = 'completed';
Example
Read each document as a single JSON value and extract fields with JSON functions:
SELECT JSON_VALUE(json."order_id")::BIGINT AS order_id,
json."shipping"."address" AS address
FROM READ_JSON('s3://my_bucket/orders.jsonl', PARSE_AS_JSON => TRUE);
Example
Use a location object and a glob pattern:
SELECT count(*) FROM READ_JSON(LOCATION => 'my_events', PATTERN => '2026/*.jsonl');