READ_JSON

A table-valued function (TVF) that reads JSON files from object storage and returns their contents as a table, so you can query files in place — no external table or COPY FROM required. Access the data either through a location object (recommended, and the way to read from sources other than Amazon S3) or through a direct Amazon S3 URL with credentials. READ_JSON has two modes:

Schema inference (default) — the column names and types are inferred from the data. Nested objects become STRUCT columns and arrays become ARRAY columns.
PARSE_AS_JSON => TRUE — no inference; every JSON document becomes one row with a single column named json of type JSON. Use this for schemaless, polymorphic, or deeply irregular data, and as the escape hatch whenever schema inference fails.

Syntax

-- Using location object (recommended)
READ_JSON (
  LOCATION => 'location_name'
  [, PATTERN => <pattern>]
  [, SCHEMA => '<column_definitions>']
  [, PARSE_AS_JSON => { TRUE | FALSE }]
  [, COMPRESSION => <compression_type>]
)

-- Using static credentials
READ_JSON (
  URL => <file_url>
  [, SCHEMA => '<column_definitions>']
  [, PARSE_AS_JSON => { TRUE | FALSE }]
  [, COMPRESSION => <compression_type>]
  [, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
  [, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
  [, AWS_SESSION_TOKEN => <aws_session_token>]
  [, AWS_ROLE_ARN => <aws_role_arn>]
  [, AWS_ROLE_EXTERNAL_ID => <aws_role_external_id>]
)

When you pass AWS_ROLE_ARN, set the optional AWS_ROLE_EXTERNAL_ID to add a customer-controlled condition to your role’s trust policy.

For role-based AWS access you can additionally set an external ID. An external ID is a value you choose and control that AWS checks when Firebolt assumes your role, adding a second condition on top of your account’s unique IAM principal. Configuring one is a recommended best practice. See IAM roles.

Parameters

Parameter	Description	Supported input types
`LOCATION`	The name of a location object that holds the URL and credentials for the source (Amazon S3, Google Cloud Storage, and other supported sources). Firebolt recommends using `LOCATION` to store credentials for authentication. `LOCATION` must be a string literal (e.g. `LOCATION => 'my_location'`); unlike `URL` it cannot be passed positionally. For a comprehensive guide, see LOCATION objects.	`TEXT`
`PATTERN`	When using `LOCATION`, an optional glob pattern to filter files within the location’s URL path. The pattern is applied relative to the location’s base path. Defaults to `*`.	`TEXT`
`URL`	The Amazon S3 location of your files, as a constant string. The expected format is `s3://{bucket_name}/{path}`, where the path may include a glob pattern (e.g. `s3://my_bucket/events/*.json`). To read from other sources, use `LOCATION` instead.	`TEXT`
`SCHEMA`	An explicit column schema, written as a comma-separated `CREATE TABLE` column list (no enclosing parentheses), for example `SCHEMA => 'id BIGINT, payload STRUCT(a INT, b TEXT)'`. When provided, schema inference is skipped and each JSON document is read into the declared columns; nested objects can be declared as `STRUCT` columns and JSON arrays as `ARRAY` columns. Mutually exclusive with `PARSE_AS_JSON`.	`TEXT`
`PARSE_AS_JSON`	When `TRUE`, schema inference is skipped and every JSON document is returned as one row with a single column named `json` of type `JSON`. Default: `FALSE`.	`BOOLEAN`
`COMPRESSION`	The compression of the files: `'gzip'`, `'snappy'`, `'lz4'`, `'zstd'`, or `'brotli'`. By default, the compression is detected from the file extension (for example `.json.gz`).	`TEXT`
`AWS_ACCESS_KEY_ID`	The AWS access key ID.	`TEXT`
`AWS_SECRET_ACCESS_KEY`	The AWS secret access key.	`TEXT`
`AWS_SESSION_TOKEN`	The AWS session token.	`TEXT`
`AWS_ROLE_ARN`	The AWS role ARN.	`TEXT`
`AWS_ROLE_EXTERNAL_ID`	The AWS role external ID.	`TEXT`

When using static credentials, the URL can be passed as either the first positional parameter or a named parameter, and credentials are not required for accessing public buckets.

Accepted file contents

Each file may contain:

Newline-delimited JSON (NDJSON / JSON Lines) — one document per line. This is the recommended format.
A single JSON document, including pretty-printed documents spanning multiple lines, which yields exactly one row.
One top-level JSON array of objects ([{...}, {...}]) — every array element becomes a row.

Every document must be a JSON object. Files containing bare scalars or arrays of scalars are rejected.

Schema inference

The schema is inferred from the most recently modified file matching the URL or pattern. That file is read in full, and every document in it contributes to the schema:

The column set is the union of all keys across the documents. A key missing from a document yields NULL in that row.
Nested objects become STRUCT columns (recursively); arrays become ARRAY columns, including arrays of structs.
Other matching files do not influence the schema. When they are read, their unknown keys are ignored and missing keys yield NULL. These files are decoded tolerantly, matching JSON external tables: some non-conformant constructs (for example unquoted boolean words or trailing commas in arrays) are accepted rather than rejected.

JSON value	Inferred type
`true` / `false`	`BOOLEAN`
integer	`BIGINT`
fractional number	`DOUBLE PRECISION`
string	`TEXT`
string in ISO-8601 timestamp format (all values of the key)	`TIMESTAMP`
array	`ARRAY` of the unified element type
object	`STRUCT`
only `null` observed	`TEXT`

All inferred columns are nullable. Integers and fractional numbers in the same key unify to DOUBLE PRECISION. A JSON null always becomes SQL NULL.

When inference fails, use PARSE_AS_JSON => TRUE. Schema inference fails with an actionable error when a key has irreconcilable types across documents (for example a number in one document and a string in another), when a document contains duplicate keys, or when the schema would exceed 500 keys (nested keys included — a guard against map-like documents with unique keys exploding into thousands of columns). In all of these cases, PARSE_AS_JSON => TRUE reads the same files without restrictions, returning each document as a single JSON value.

Return Type

In the default mode, the result is a table whose columns follow the inferred schema. With PARSE_AS_JSON => TRUE, the result is a table with a single nullable column json of type JSON, one row per document; every document is validated as well-formed JSON. Like the other read functions, READ_JSON exposes the $source_file_name and $source_file_timestamp pseudo columns identifying the file each row came from.

Examples

Example Read events with an inferred schema, including nested structs, and access struct fields:

SELECT id, payload.user_name, tags
FROM READ_JSON('s3://my_bucket/events/*.json.gz')
WHERE payload.status = 'completed';

Example Read each document as a single JSON value and extract fields with JSON functions:

SELECT JSON_VALUE(json."order_id")::BIGINT AS order_id,
       json."shipping"."address" AS address
FROM READ_JSON('s3://my_bucket/orders.jsonl', PARSE_AS_JSON => TRUE);

Example Use a location object and a glob pattern:

SELECT count(*) FROM READ_JSON(LOCATION => 'my_events', PATTERN => '2026/*.jsonl');

Load semi-structured JSON data — how READ_JSON relates to JSON external tables and loading JSON into managed tables.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Syntax

Parameters

Accepted file contents

Schema inference

Return Type

Examples

​Syntax

​Parameters

​Accepted file contents

​Schema inference

​Return Type

​Examples

​Related

Syntax

Parameters

Accepted file contents

Schema inference

Return Type

Examples

Related