Work with semi-structured data

Semi-structured data is any data that does not follow a strict tabular schema and often includes fields that are not standard SQL data types. This data typically has a nested structure and supports complex types such as arrays, maps, and structs.

Common formats of semi-structured data include:

  • JSON— A widely used format for semi-structured data. For information on loading JSON data with Firebolt, see Load semi-structured JSON data.
  • Parquet and ORC— Serialization formats that support nested structures and complex data types. For information on loading Parquet data with Firebolt, see Load semi-structured Parquet data.

Firebolt’s approach to semi-structured data

Firebolt transforms semi-structured data using arrays and structs, enabling efficient querying. These data types allow for flexible modeling of nested and hierarchical data.

Arrays

Firebolt supports arrays with unpredictable lengths in the source data. These arrays can have arbitrary nesting levels, provided the nesting level is consistent within a column and known during table creation. For more details, see Work with arrays.

Structs

The STRUCT data type allows you to group multiple attributes of varying data types into a single logical unit. This is especially useful for modeling nested or hierarchical data. For more information, see STRUCT data type.

Maps

Maps, also known as dictionaries, are not supported natively by Firebolt at the moment. However, there are different approaches to represent them in Firebolt using arrays and structs.

  • A map can be represented using two coordinated arrays—one for keys and one for values. This approach is particularly useful for JSON-like data where objects have varying keys.

  • Alternatively, a map can be represented using an array of STRUCT(key T, value U) data type where T is the data type of the keys and U is the data type of the values. Instead of manipulating two arrays and coordinating indexes, let Firebolt do it for you and enjoy the simpler query syntax.