Skip to main content
A table-valued function (TVF) that reads JSON files from object storage and returns their contents as a table, so you can query files in place — no external table or COPY FROM required. Access the data either through a location object (recommended, and the way to read from sources other than Amazon S3) or through a direct Amazon S3 URL with credentials. READ_JSON has two modes:
  1. Schema inference (default) — the column names and types are inferred from the data. Nested objects become STRUCT columns and arrays become ARRAY columns.
  2. PARSE_AS_JSON => TRUE — no inference; every JSON document becomes one row with a single column named json of type JSON. Use this for schemaless, polymorphic, or deeply irregular data, and as the escape hatch whenever schema inference fails.

Syntax

-- Using location object (recommended)
READ_JSON (
  LOCATION => 'location_name'
  [, PATTERN => <pattern>]
  [, SCHEMA => '<column_definitions>']
  [, PARSE_AS_JSON => { TRUE | FALSE }]
  [, COMPRESSION => <compression_type>]
)

-- Using static credentials
READ_JSON (
  URL => <file_url>
  [, SCHEMA => '<column_definitions>']
  [, PARSE_AS_JSON => { TRUE | FALSE }]
  [, COMPRESSION => <compression_type>]
  [, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
  [, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
  [, AWS_SESSION_TOKEN => <aws_session_token>]
  [, AWS_ROLE_ARN => <aws_role_arn>]
  [, AWS_ROLE_EXTERNAL_ID => <aws_role_external_id>]
)

Parameters

ParameterDescriptionSupported input types
LOCATIONThe name of a location object that holds the URL and credentials for the source (Amazon S3, Google Cloud Storage, and other supported sources). Firebolt recommends using LOCATION to store credentials for authentication. LOCATION must be a string literal (e.g. LOCATION => 'my_location'); unlike URL it cannot be passed positionally. For a comprehensive guide, see LOCATION objects.TEXT
PATTERNWhen using LOCATION, an optional glob pattern to filter files within the location’s URL path. The pattern is applied relative to the location’s base path. Defaults to *.TEXT
URLThe Amazon S3 location of your files, as a constant string. The expected format is s3://{bucket_name}/{path}, where the path may include a glob pattern (e.g. s3://my_bucket/events/*.json). To read from other sources, use LOCATION instead.TEXT
SCHEMAAn explicit column schema, written as a comma-separated CREATE TABLE column list (no enclosing parentheses), for example SCHEMA => 'id BIGINT, payload STRUCT(a INT, b TEXT)'. When provided, schema inference is skipped and each JSON document is read into the declared columns; nested objects can be declared as STRUCT columns and JSON arrays as ARRAY columns. Mutually exclusive with PARSE_AS_JSON.TEXT
PARSE_AS_JSONWhen TRUE, schema inference is skipped and every JSON document is returned as one row with a single column named json of type JSON. Default: FALSE.BOOLEAN
COMPRESSIONThe compression of the files: 'gzip', 'snappy', 'lz4', 'zstd', or 'brotli'. By default, the compression is detected from the file extension (for example .json.gz).TEXT
AWS_ACCESS_KEY_IDThe AWS access key ID.TEXT
AWS_SECRET_ACCESS_KEYThe AWS secret access key.TEXT
AWS_SESSION_TOKENThe AWS session token.TEXT
AWS_ROLE_ARNThe AWS role ARN.TEXT
AWS_ROLE_EXTERNAL_IDThe AWS role external ID.TEXT
  • When using static credentials, the URL can be passed as either the first positional parameter or a named parameter, and credentials are not required for accessing public buckets.

Accepted file contents

Each file may contain:
  • Newline-delimited JSON (NDJSON / JSON Lines) — one document per line. This is the recommended format.
  • A single JSON document, including pretty-printed documents spanning multiple lines, which yields exactly one row.
  • One top-level JSON array of objects ([{...}, {...}]) — every array element becomes a row.
Every document must be a JSON object. Files containing bare scalars or arrays of scalars are rejected.

Schema inference

The schema is inferred from the most recently modified file matching the URL or pattern. That file is read in full, and every document in it contributes to the schema:
  • The column set is the union of all keys across the documents. A key missing from a document yields NULL in that row.
  • Nested objects become STRUCT columns (recursively); arrays become ARRAY columns, including arrays of structs.
  • Other matching files do not influence the schema. When they are read, their unknown keys are ignored and missing keys yield NULL. These files are decoded tolerantly, matching JSON external tables: some non-conformant constructs (for example unquoted boolean words or trailing commas in arrays) are accepted rather than rejected.
JSON valueInferred type
true / falseBOOLEAN
integerBIGINT
fractional numberDOUBLE PRECISION
stringTEXT
string in ISO-8601 timestamp format (all values of the key)TIMESTAMP
arrayARRAY of the unified element type
objectSTRUCT
only null observedTEXT
All inferred columns are nullable. Integers and fractional numbers in the same key unify to DOUBLE PRECISION. A JSON null always becomes SQL NULL.
When inference fails, use PARSE_AS_JSON => TRUE. Schema inference fails with an actionable error when a key has irreconcilable types across documents (for example a number in one document and a string in another), when a document contains duplicate keys, or when the schema would exceed 500 keys (nested keys included — a guard against map-like documents with unique keys exploding into thousands of columns). In all of these cases, PARSE_AS_JSON => TRUE reads the same files without restrictions, returning each document as a single JSON value.

Return Type

In the default mode, the result is a table whose columns follow the inferred schema. With PARSE_AS_JSON => TRUE, the result is a table with a single nullable column json of type JSON, one row per document; every document is validated as well-formed JSON. Like the other read functions, READ_JSON exposes the $source_file_name and $source_file_timestamp pseudo columns identifying the file each row came from.

Examples

Example Read events with an inferred schema, including nested structs, and access struct fields:
SELECT id, payload.user_name, tags
FROM READ_JSON('s3://my_bucket/events/*.json.gz')
WHERE payload.status = 'completed';
Example Read each document as a single JSON value and extract fields with JSON functions:
SELECT JSON_VALUE(json."order_id")::BIGINT AS order_id,
       json."shipping"."address" AS address
FROM READ_JSON('s3://my_bucket/orders.jsonl', PARSE_AS_JSON => TRUE);
Example Use a location object and a glob pattern:
SELECT count(*) FROM READ_JSON(LOCATION => 'my_events', PATTERN => '2026/*.jsonl');