Learn how to map semi-structured data from a JSON document to a Firebolt table.
TEXT
column, they remain accessible for later extraction.
The following code example uses the previously created doc_visits_source
table to define columns that map directly to known keys:
id | start_time | duration | tags |
---|---|---|---|
1 | 1/6/2020 17:00 | 450 | [“summer-sale”, “sports”] |
2 | 1/5/2020 12:00 | 959 | [“gadgets”, “audio”] |
id
, start_time
, and duration
, are stored in separate columns, which makes it easier to filter, sort, or join by these fields.user_agent
with agent
, platform
, and resolution
are ignored and not stored in the table.tags
, which supports arbitrary numbers of values without schema changes.doc_visits_source
table to parse JSON data as it loads and inserts extracted fields into a Firebolt table named visits_transformed
. It shows how to use JSON_POINTER_EXTRACT_KEYS and JSON_POINTER_EXTRACT_VALUES to store a dynamic key-value pair – agent_props_keys
and agent_props_vals
– from a nested object:
id | start_time | duration | tags | agent_props_keys | agent_props_vals |
---|---|---|---|---|---|
1 | 1/6/2020 17:00 | 450 | [“summer-sale”,“sports”] | [“agent”, “platform”, “resolution”] | [“Mozilla/5.0”, “Windows NT 6.1”, “1024x4069”] |
2 | 1/5/2020 12:00 | 959 | [“gadgets”,“audio”] | [“agent”, “platform”] | [“Safari”, “iOS 14”] |
user_agent
object is stored in two arrays: agent_props_keys
and agent_props_vals
. The JSON_POINTER_EXTRACT_KEYS
function extracts the keys from the user_agent
object into the agent_props_keys
array. The JSON_POINTER_EXTRACT_VALUES
function extracts the corresponding values into the agent_props_vals
array. Storing keys and values in parallel arrays offers flexibility when the user_agent
map changes and avoids schema updates for new or removed fields.NULL
values cannot be cast to INT
. For example, the following query attempts to extract a non-existent field /unknown_field
and cast it to INT
, which results in an error:
id | start_time | duration | tags |
---|---|---|---|
0 | NULL | NULL | NULL |
0 | NULL | NULL | NULL |
1 | 1/6/2020 17:00 | 450 | [“summer-sale”, “sports”] |
2 | 1/5/2020 12:00 | 959 | [“gadgets”, “audio”] |
doc_visits_source
table to create a permanent table that stores raw JSON, allowing you to parse only what you need on demand:
raw_json |
---|
{"id": 1, "StartTime": "2020-01-06 17:00:00", "Duration": 450, "tags": ["summer-sale", "sports"], "user_agent": {"agent": "Mozilla/5.0", "platform": "Windows NT 6.1", "resolution": "1024x4069"}} |
{"id": 2, "StartTime": "2020-01-05 12:00:00", "Duration": 959, "tags": ["gadgets", "audio"], "user_agent": {"agent": "Safari", "platform": "iOS 14"}} |
id
, start_time
, durations
, and tags
columns follow the same purpose as in the previous table example.TEXT
column, rather than being parsed into separate fields. This approach is beneficial when the required fields are unknown at ingestion or the JSON structure changes frequently, allowing for flexible data storage without modifying the schema. Fields can be extracted dynamically at query time using Firebolt’s JSON functions, though frequent parsing may increase query complexity and cost.