Work with semi-structured data
Semi-structured data is any data that does not follow a strict tabular schema and often includes fields that are not standard SQL data types. This data typically has a nested structure and supports complex types such as arrays, maps, and structs.
Common formats of semi-structured data include:
- JSON— A widely used format for semi-structured data. For information on loading JSON data with Firebolt, see Load semi-structured JSON data.
- Parquet and ORC— Serialization formats that support nested structures and complex data types. For information on loading Parquet data with Firebolt, see Load semi-structured Parquet data.
Firebolt’s approach to semi-structured data
Firebolt transforms semi-structured data using arrays and structs, enabling efficient querying. These data types allow for flexible modeling of nested and hierarchical data.
Arrays
Firebolt supports arrays with unpredictable lengths in the source data. These arrays can have arbitrary nesting levels, provided the nesting level is consistent within a column and known during table creation. For more details, see Work with arrays.
Structs
The STRUCT
data type allows you to group multiple attributes of varying data types into a single logical unit. This is especially useful for modeling nested or hierarchical data. For more information, see STRUCT data type.
Maps
Maps, also known as dictionaries, are not supported natively by Firebolt at the moment. However, there are different approaches to represent them in Firebolt using arrays and structs.
-
A map can be represented using two coordinated arrays—one for keys and one for values. This approach is particularly useful for JSON-like data where objects have varying keys.
-
Alternatively, a map can be represented using an array of
STRUCT(key T, value U)
data type whereT
is the data type of the keys andU
is the data type of the values. Instead of manipulating two arrays and coordinating indexes, let Firebolt do it for you and enjoy the simpler query syntax.