> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

> Learn how to map semi-structured data from a JSON document to a Firebolt table.

# Load semi-structured JSON data

Semi-structured data does not follow a strict table format but contains structured tags or key-value pairs. JSON is an example of semi-structured data. Firebolt supports the following ways to work with JSON based on how your data changes and how you query it:

* [Query JSON files in place with READ\_JSON](#query-json-files-in-place-with-read_json) if you want to explore or process files without defining any schema up front.
* [Load JSON into a fixed schema](#load-json-into-a-fixed-schema) if your JSON data has a stable set of fields with shallow nesting.
* [Transform the input during load](#transform-the-input-during-load) if your table must always contain certain fields.
* [Store JSON as text](#store-json-as-text) if you need only specific fields on demand or if the table structure changes frequently.
* [Load JSON with an external table](#load-json-with-an-external-table) if you ingest repeatedly from object storage and want the source declared as a reusable table.

READ\_JSON needs no setup and is shown first. The table-based methods that follow share the sample dataset in [Sample JSON dataset](#sample-json-dataset).

## Query JSON files in place with READ\_JSON

The [READ\_JSON](/reference-sql/functions-reference/table-valued/read_json) table-valued function reads JSON files directly from object storage, with no table definition required. By default it infers the schema from the data, turning nested objects into `STRUCT` columns and arrays into `ARRAY` columns:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
SELECT id, user_agent.agent, tags
FROM READ_JSON('s3://my_bucket/visits/*.json');
```

For irregular or polymorphic documents, or whenever inference fails, pass `PARSE_AS_JSON => TRUE` to receive each document as one row in a single [JSON](/reference-sql/data-types)-typed column named `json`, and extract what you need with JSON functions:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
SELECT JSON_VALUE(json."id")::BIGINT AS id, json."user_agent"."agent" AS agent
FROM READ_JSON('s3://my_bucket/visits/*.json', PARSE_AS_JSON => TRUE);
```

`READ_JSON` accepts newline-delimited JSON, single (including pretty-printed) documents, and top-level JSON arrays, with optional compression. To persist the data, combine it with `INSERT INTO ... SELECT`. The table-based approaches below remain the right choice when you ingest repeatedly into a stable schema and want the data contract to be explicit.

## Sample JSON dataset

The table-based methods that follow use this sample dataset. Each line is a single JSON object representing a session record for a website.

```json theme={"theme":{"light":"css-variables","dark":"css-variables"}}
[
  {
    "id": 1,
    "StartTime": "2020-01-06 17:00:00",
    "Duration": 450,
    "tags": ["summer-sale", "sports"],
    "user_agent": {
      "agent": "Mozilla/5.0",
      "platform": "Windows NT 6.1",
      "resolution": "1024x4069"
    }
  },
  {
    "id": 2,
    "StartTime": "2020-01-05 12:00:00",
    "Duration": 959,
    "tags": ["gadgets", "audio"],
    "user_agent": {
      "agent": "Safari",
      "platform": "iOS 14"
    }
  }
]
```

The following code example creates a staging table that stores the raw JSON data, allowing you to run the subsequent examples:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
-- Create a staging table for raw JSON data with one JSON object per row
DROP TABLE IF EXISTS doc_visits_source;
CREATE TABLE doc_visits_source (
  raw_json TEXT
);

-- Insert raw JSON data as individual rows
INSERT INTO doc_visits_source (raw_json)
VALUES
('{"id": 1, "StartTime": "2020-01-06 17:00:00", "Duration": 450, "tags": ["summer-sale", "sports"], "user_agent": {"agent": "Mozilla/5.0", "platform": "Windows NT 6.1", "resolution": "1024x4069"}}'),
('{"id": 2, "StartTime": "2020-01-05 12:00:00", "Duration": 959, "tags": ["gadgets", "audio"], "user_agent": {"agent": "Safari", "platform": "iOS 14"}}');
```

## Load JSON into a fixed schema

If your JSON data has a stable set of fields with shallow nesting, you can load it into a table with a fixed schema to simplify queries. Missing keys are assigned default values. This method allows you to query columns directly without additional parsing, making queries faster and easier to write. Extra keys that are not explicitly mapped are excluded from structured tables, making this approach less flexible for changing data. If stored separately in a `TEXT` column, they remain accessible for later extraction.

The following code example uses the previously created `doc_visits_source` table to define columns that map directly to known keys:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
-- Create the target table 'visits_fixed' with a fixed schema
DROP TABLE IF EXISTS visits_fixed;
CREATE TABLE visits_fixed (
  id INT,
  start_time TIMESTAMP,
  duration INT,
  tags ARRAY(TEXT)
)
PRIMARY INDEX start_time;

-- Insert data into 'visits_fixed' by extracting values from the raw JSON
INSERT INTO visits_fixed
SELECT
  JSON_POINTER_EXTRACT(raw_json, '/id')::INT AS id,
  TO_TIMESTAMP(JSON_VALUE(JSON_POINTER_EXTRACT(raw_json, '/StartTime')), 'YYYY-MM-DD HH24:MI:SS') AS start_time,
  JSON_POINTER_EXTRACT(raw_json, '/Duration')::INT AS duration,
  JSON_POINTER_EXTRACT(raw_json, '/tags')::ARRAY(TEXT) AS tags
FROM doc_visits_source;
```

The following table shows the expected results:

| id | start\_time    | duration | tags                       |
| -- | -------------- | -------- | -------------------------- |
| 1  | 1/6/2020 17:00 | 450      | \["summer-sale", "sports"] |
| 2  | 1/5/2020 12:00 | 959      | \["gadgets", "audio"]      |

Important characteristics of the table:

* The mandatory scalar fields, `id`, `start_time`, and `duration`, are stored in separate columns, which makes it easier to filter, sort, or join by these fields.
* Each column maps directly to a known JSON key, allowing for simpler queries without the need for JSON functions.
* Default values ensure that the table loads even if some fields are missing or additional keys appear. Extra JSON fields such as `user_agent` with `agent`, `platform`, and `resolution` are ignored and not stored in the table.
* Array columns are used to store `tags`, which supports arbitrary numbers of values without schema changes.

## Transform the input during load

Parsing JSON data during ingestion eliminates the need for subsequent query-time parsing, simplifying and accelerating queries. However, transforming data during load also requires well-defined JSON paths that remain consistent. If the JSON paths change, the load might fail.

The following code example uses the previously created `doc_visits_source` table to parse JSON data as it loads and inserts extracted fields into a Firebolt table named `visits_transformed`. It shows how to use [JSON\_POINTER\_EXTRACT\_KEYS](/reference-sql/functions-reference/json/json-pointer-extract-keys) and [JSON\_POINTER\_EXTRACT\_VALUES](/reference-sql/functions-reference/json/json-pointer-extract-values) to store a dynamic key-value pair – `agent_props_keys` and `agent_props_vals` – from a nested object:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
DROP TABLE IF EXISTS visits_transformed;
CREATE TABLE visits_transformed (
  id INT,
  start_time TIMESTAMP,
  duration INT,
  tags ARRAY(TEXT),
  agent_props_keys ARRAY(TEXT),
  agent_props_vals ARRAY(TEXT)
)
PRIMARY INDEX start_time;

INSERT INTO visits_transformed
SELECT
  JSON_POINTER_EXTRACT(raw_json, '/id')::INT,
  TO_TIMESTAMP(JSON_VALUE(JSON_POINTER_EXTRACT(raw_json, '/StartTime')), 'YYYY-MM-DD HH24:MI:SS'),
  JSON_POINTER_EXTRACT(raw_json, '/Duration')::INT,
  JSON_POINTER_EXTRACT(raw_json, '/tags')::ARRAY(TEXT),
  JSON_POINTER_EXTRACT_KEYS(raw_json, '/user_agent')::ARRAY(TEXT),
  JSON_POINTER_EXTRACT_VALUES(raw_json, '/user_agent')::ARRAY(TEXT)
FROM doc_visits_source;
```

The following table shows the expected results:

| id | start\_time    | duration | tags                      | agent\_props\_keys                   | agent\_props\_vals                              |
| -- | -------------- | -------- | ------------------------- | ------------------------------------ | ----------------------------------------------- |
| 1  | 1/6/2020 17:00 | 450      | \["summer-sale","sports"] | \[“agent”, “platform”, “resolution”] | \[“Mozilla/5.0”, “Windows NT 6.1”, “1024x4069”] |
| 2  | 1/5/2020 12:00 | 959      | \["gadgets","audio"]      | \[“agent”, “platform”]               | \[“Safari”, “iOS 14”]                           |

Important characteristics of the previous table:

* The `user_agent` object is stored in two arrays: `agent_props_keys` and `agent_props_vals`. The [`JSON_POINTER_EXTRACT_KEYS`](/reference-sql/functions-reference/json/json-pointer-extract-keys) function extracts the keys from the `user_agent` object into the `agent_props_keys` array. The [`JSON_POINTER_EXTRACT_VALUES`](/reference-sql/functions-reference/json/json-pointer-extract-values) function extracts the corresponding values into the `agent_props_vals` array. Storing keys and values in parallel arrays offers flexibility when the `user_agent` map changes and avoids schema updates for new or removed fields.

A common error may occur if a field path does not exist in the JSON document. Firebolt returns an error because `NULL` values cannot be cast to `INT`. For example, the following query attempts to extract a non-existent field `/unknown_field` and cast it to `INT`, which results in an error:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
SELECT JSON_POINTER_EXTRACT(raw_json, '/unknown_field')::INT 
FROM doc_visits_source;
```

To avoid this error, use a default value or conditional expression as shown in the following code example:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
INSERT INTO visits_transformed
SELECT
  CASE 
    WHEN JSON_POINTER_EXTRACT(raw_json, '/unknown_field') IS NOT NULL 
    THEN JSON_POINTER_EXTRACT(raw_json, '/unknown_field')::INT 
    ELSE NULL
  END AS id
FROM doc_visits_source;
```

The following table shows the expected results:

| id | start\_time    | duration | tags                       |
| -- | -------------- | -------- | -------------------------- |
| 0  | NULL           | NULL     | NULL                       |
| 0  | NULL           | NULL     | NULL                       |
| 1  | 1/6/2020 17:00 | 450      | \["summer-sale", "sports"] |
| 2  | 1/5/2020 12:00 | 959      | \["gadgets", "audio"]      |

## Store JSON as text

You can store JSON as a single text column if the data structure changes frequently or if you only need certain fields in some queries. This approach simplifies ingestion since no parsing occurs during loading, but it requires parsing fields at query time, which can make queries more complex if you need to extract many fields regularly.

The following code example uses the previously created intermediary `doc_visits_source` table to create a permanent table that stores raw JSON, allowing you to parse only what you need on demand:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
DROP TABLE IF EXISTS visits_raw;
CREATE TABLE visits_raw (
  raw_json TEXT
);

-- Insert data into the 'visits_raw' table from the staging table
INSERT INTO visits_raw
SELECT raw_json
FROM doc_visits_source;
```

The following table shows the expected results:

| raw\_json                                                                                                                                                                                            |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `{"id": 1, "StartTime": "2020-01-06 17:00:00", "Duration": 450, "tags": ["summer-sale", "sports"], "user_agent": {"agent": "Mozilla/5.0", "platform": "Windows NT 6.1", "resolution": "1024x4069"}}` |
| `{"id": 2, "StartTime": "2020-01-05 12:00:00", "Duration": 959, "tags": ["gadgets", "audio"], "user_agent": {"agent": "Safari", "platform": "iOS 14"}}`                                              |

Important characteristics of the table:

* The `id`, `start_time`, `durations`, and `tags` columns follow the same purpose as in the [previous table example](#transform-the-input-during-load).
* Each row in the previous table contains a complete JSON object stored in a single `TEXT` column, rather than being parsed into separate fields. This approach is beneficial when the required fields are unknown at ingestion or the JSON structure changes frequently, allowing for flexible data storage without modifying the schema. Fields can be extracted dynamically at query time using Firebolt's JSON functions, though frequent parsing may increase query complexity and cost.
* Parsing occurs at query time, which can save upfront processing when data is loaded, but it might increase query complexity and cost if you need to parse many fields frequently.
* Subsequent queries need to extract fields manually with JSON functions as needed.

## Load JSON with an external table

When you ingest from the same object-storage location repeatedly, declare it once as a JSON [external table](/reference-sql/commands/data-definition/create-external-table) and reuse it. An external table reads the files in place; load into a Firebolt table with `INSERT INTO ... SELECT`. `TYPE = (JSON)` supports three column mappings.

Map each JSON key to a declared column (the default) when the fields are stable:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE EXTERNAL TABLE visits_external (
  id INT,
  StartTime TIMESTAMP,
  Duration INT,
  tags ARRAY(TEXT)
)
URL = 's3://your-bucket-name/path/to/json-files/'
OBJECT_PATTERN = '*.json'
TYPE = (JSON);
```

Ingest each document whole into a single `TEXT` column with `PARSE_AS_TEXT = TRUE`, then extract fields at query time with JSON functions. The table must declare exactly one data column, of type `TEXT`:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE EXTERNAL TABLE visits_external_text (
  raw_json TEXT
)
URL = 's3://your-bucket-name/path/to/json-files/'
OBJECT_PATTERN = '*.json'
TYPE = (JSON PARSE_AS_TEXT = TRUE);
```

Ingest each document whole into a single [`JSON`](/reference-sql/data-types)-typed column with `PARSE_AS_JSON = TRUE`. This mirrors `READ_JSON(..., PARSE_AS_JSON => TRUE)`: the value is a real JSON value you query with JSON functions rather than re-parsed text, and every document is validated as well-formed JSON on read. The table must declare exactly one data column, of type `JSON` (partition columns excepted), and `PARSE_AS_TEXT` and `PARSE_AS_JSON` are mutually exclusive:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE EXTERNAL TABLE visits_external_json (
  json JSON
)
URL = 's3://your-bucket-name/path/to/json-files/'
OBJECT_PATTERN = '*.json'
TYPE = (JSON PARSE_AS_JSON = TRUE);

SELECT JSON_VALUE(json."id")::BIGINT AS id, json."user_agent"."agent" AS agent
FROM visits_external_json;
```

## Lenient JSON parsing

When Firebolt reads JSON values into typed columns — both `READ_JSON` with an inferred or fixed schema and JSON [external tables](/reference-sql/commands/data-definition/create-external-table) — the field-level parser is intentionally more permissive than the [JSON standard](https://www.json.org). This is for backward compatibility with data that has historically loaded successfully; it is not a claim of strict conformance. The following non-standard constructs are **accepted**:

| Construct                          | Example                     | Standard JSON       |
| :--------------------------------- | :-------------------------- | :------------------ |
| Curly braces around array values   | `{"a": {1, 2, 3}}`          | `[1, 2, 3]`         |
| Unquoted boolean words, any case   | `{"b": yes}`, `{"b": TRUE}` | `true` / `false`    |
| Non-finite floats                  | `{"f": inf}`, `{"f": -inf}` | (not representable) |
| Leading `+` on integers            | `{"n": +5}`                 | `5`                 |
| Trailing comma in an array         | `{"a": [1, 2,]}`            | `[1, 2]`            |
| Quoted number for a numeric column | `{"n": "42"}`               | `42` (unquoted)     |

Some non-standard constructs are still **rejected**, including `nan` floats and unquoted strings (`{"s": bare_word}`).

Two paths remain strictly standards-conformant and reject all of the above:

* **`READ_JSON` schema inference** validates the sampled file with a conformant parser before deciding column types, so a malformed sampled file fails the query rather than being read leniently.
* **`PARSE_AS_JSON => TRUE`** validates every document as well-formed JSON on ingestion.

If you need strict validation of every input file, read with `PARSE_AS_JSON => TRUE`.