Firebolt’s STRUCT
data type is currently in public preview. We are gathering feedback and further refining this feature.
STRUCT data type
This topic describes the Firebolt implementation of the STRUCT
data type.
Overview
The struct data type is a composite type that allows you to group multiple attributes of varying data types into a single logical unit. Common in modern databases, structs enable flexible and hierarchical data modeling by representing related data as a single object with named attributes. This is especially useful for semi-structured or nested data, allowing for clearer organization, better readability, and simplified queries when working with complex data relationships.
Defining struct types
A struct type is defined using the STRUCT(...)
syntax:
STRUCT (
s_key INT,
value TEXT
)
For example, to create a table with a struct column:
CREATE TABLE t (
id BIGINT,
kv STRUCT (
s_key INT,
value TEXT
)
ts TIMESTAMP
);
Nullability constraint
A nullability constraint can be specified to a struct column but not any struct attribute. This means struct attributes are always nullable, even if the enclosing struct is NOT NULL
.
For example, defining the kv
column to be NOT NULL
is OK. Defining the s_key
or value
attribute to be NOT NULL
is not supported.
CREATE TABLE t (
id BIGINT,
kv STRUCT (
s_key INT NOT NULL, -- not supported, returns an error
value TEXT NOT NULL -- not supported, returns an error
) NOT NULL -- OK
ts TIMESTAMP
);
In this example, column kv
cannot store NULL
values for the whole struct. But a struct value where both s_key
and value
attributes are NULL
is valid.
Although specifying the NOT NULL
constraint to a struct attribute yields an error, declaring a struct attribute as NULL
explicitly is accepted. It is redundant but may be used to improve readability.
Struct literals
Creating struct literals is not yet supported. For information about how to ingest data of struct types, see Ingesting struct values.
Partition on struct or struct attributes
Partitioning a table on struct columns or struct attributes is not yet supported, but will be released in the near future.
Ingesting struct values
Struct values can be ingested into a table by using the COPY FROM
statement. For example:
COPY INTO t FROM 's3://path/to/file.parquet'
WITH TYPE=parquet SUPPORT_STRUCTS=TRUE;
- When
SUPPORT_STRUCTS=TRUE
: Nested structures in the source data will be inferred asSTRUCT
data types. - When
SUPPORT_STRUCTS=FALSE
: Nested structures in the source data will be inferred as individual shredded columns.
The SUPPORT_STRUCTS
parameter only affects type inference during schema discovery. With predefined table schemas, the system maps incoming data to your schema regardless of this setting. For more details, see the COPY FROM
documentation.
The automatic schema discovery feature of the COPY FROM
statement supports the STRUCT
data type for Parquet files. Support of other file formats will be added in the near future.
Other utilities like READ_PARQUET
and external tables do not support the STRUCT
data type at the moment. We will extend the coverage in the following releases.
Querying structs
In Firebolt, each attribute of a struct is stored as an individual column. When processing struct values, the system never packs attributes of a struct into a binary format. Instead, it always works on a list of attribute columns directly. The query optimizer applies column pruning techniques on struct attributes, too. This means Firebolt only accesses struct attributes that are strictly needed for answering a query.
Accessing attributes
A struct attribute can be accessed by using the dot syntax:
SELECT kv.value FROM t WHERE kv.s_key > 10;
Example output:
value TEXT
'test_value_1'
'test_value_2'
When selecting a specific attribute of a struct, the resulting column will be named exactly after that attribute. In the previous code example, this column is just value
, without adding the struct name or any extra formatting to the column name.
In addition, the PostgreSQL syntax for accessing a field of a composite type is also supported:
-- PostgreSQL syntax
SELECT (kv).value FROM t WHERE (kv).s_key > 10;
There is a subtle difference in the two syntaxes in name resolution. When t.a
can be resolved to either a column a
of table t
or an attribute a
of a struct t
, the Firebolt syntax t.a
prefers the table column over the struct attribute. On the other hand, the PostgreSQL syntax (t).a
prefers the struct attribute over the table column.
When there is no such ambiguity, both syntaxes resolve to the same object and they can be used interchangeably.
Type conversion
At the moment, a STRUCT
value cannot be converted to or from any other non-struct data type.
A value of a struct type S1
can be converted into another struct type S2
only if both of the following hold:
S1
andS2
have exactly the same structure. Attributes can have different names inS1
andS2
, but they must form a bijection.- For each attribute
p
inS1
and its image attributeq
inS2
, there exists a type conversion fromp
’s data type toq
’s data type.
Grouping and ordering
Struct columns can be used in GROUP BY
and ORDER BY
clauses of a SELECT
statement:
SELECT MAX(ts) FROM t GROUP BY kv;
SELECT * FROM t ORDER BY kv;
When ordering by a struct column, the query optimizer decomposes it to order by all struct attributes. The order of the attributes is the order of definition. In the example above, ORDER BY kv
means ordering by "s_key"
attribute first, and then "value"
.
Understanding EXPLAIN
output with structs
Because struct values are processed through columnar decomposition, it’s important to consider how struct attributes appear as separate columns, how nullability is tracked with marker columns, and how nodes like Projection
and Sort
operate on decomposed fields. These details are especially relevant when interpreting the EXPLAIN
output for queries on STRUCT
columns.
Let us take a look at the EXPLAIN
output of the last example query above:
EXPLAIN
SELECT * FROM t ORDER BY kv;
[0] [Projection] t.id, struct(t.kv.$not_null$, t.kv.s_key, t.kv.value), t.ts
\_[1] [Sort] OrderBy: [t.kv.$not_null$ Ascending Last, t.kv.s_key Ascending Last, t.kv.value Ascending Last]
\_[2] [StoredTable] Name: "t", used 5/5 column(s) FACT
Firstly, note that the StoredTable
node [2] shows “5/5 column(s)”. Recall that struct values are decomposed into individual attribute columns. So table t
is actaully decomposed into the following schema:
TABLE t (
id BIGINT,
kv.$not_null$ BOOLEAN NOT NULL,
kv.s_key INT,
kv.value TEXT
ts TIMESTAMP
)
Other than the two attributes kv.s_key
and kv.value
, there is a third column kv.$not_null$
that denotes whether the whole struct value is NULL
. This column exists only if the struct is defined as nullable.
The Sort
node [1] directly orders tuples based on the decomposed struct attributes, including the $not_null$
marker column.
The top Projection
node [0]
constructs struct values for the kv
column, as required by the query. However, instead of packing these values into actual struct objects, Firebolt serializes the result directly using the list of attribute columns.
Limitations
At the moment, built-in functions and operators generally do not accept struct values, including comparison operators. Use struct attributes in queries wherever possible. More functions and operators supporting struct values will be added in the near future.
Type composition
Firebolt supports both ARRAY
and STRUCT
composite types. They compose freely, so you can model data as arrays of structs, structs of arrays, or more complicated shapes.
Nested struct
A struct attribute can have any type that is supported in Firebolt, including the STRUCT
type itself. This means nested STRUCT
types are allowed. For example:
CREATE TABLE t2 (
id BIGINT,
kv STRUCT (
s_key INT,
value STRUCT (
parent INT,
description TEXT
) NOT NULL -- not supported
) NOT NULL -- OK
ts TIMESTAMP
);
A nested struct is an attribute of its parent struct. Defining a nullability constraint on a struct attribute is not supported. As a result, nested structs are always nullable.
Array of struct
The STRUCT
type composes with the ARRAY
type as well. It is possible to define an array of struct:
CREATE TABLE t3 (
id BIGINT,
kvs ARRAY(
STRUCT (
s_key INT,
value TEXT
) NOT NULL -- not supported
) NOT NULL -- OK
ts TIMESTAMP
);
Firebolt does not support defining a nullability contraint for an array element when the array element is a STRUCT
or an array of STRUCT
. As a result, for an array of struct column, only the top-level array can be NOT NULL
. All nested element types are nullable by default.
At the moment, array functions and lambda functions generally do not support array of struct. The support will be extended in the near future.
Firebolt supports using array subscription ([]
syntax) to access an array element on an array of STRUCT
values.
When accessing an attribute of a struct in an array, the combined syntax arr[1].a
does not work at the moment. Use the PostgreSQL syntax instead: (arr[1]).a
. The issue will be fixed in the near future.
Unnesting arrays of structs
Unnesting an array of structs is just the same as unnesting an array of any other type:
SELECT id, UNNEST(kvs) AS kv FROM t3;
id BIGINT, kv struct(s_key int null, value text null)
1,{s_key:1,value:'test_value_1'}
1,{s_key:2,value:'test_value_2'}
The result of unnesting an array of structs column is a struct column. This behavior is consistent with unnesting an array of any other type.
Unnesting multiple arrays of structs together is supported, too. For more details on the array unnesting syntax, see UNNEST
.
Aggregating structs into an array of struct
Some aggregate functions support structs. In particular, ARRAY_AGG
can be useful when working on array of struct, together with UNNEST
.
WITH unnested_structs AS (
SELECT id, UNNEST(kvs) AS kv FROM t3
)
SELECT ARRAY_AGG(kv)
FROM unnested_structs
WHERE kv.s_key > 10
GROUP BY id;
The example above shows a way of selecting all structs with s_key > 10
from an array of struct column and preserves the array structure.
Using UNNEST
and ARRAY_AGG
together provides a fundamental way of working with arrays of struct even if there is no array or lambda function available for the operation in need.
Transforming arrays of structs
The example query above can be simplified by using the ARRAY_TRANSFORM
function.
SELECT ARRAY_FILTER(y -> y > 10, ARRAY_TRANSFORM(x -> x.s_key, kvs)) FROM t3;
ARRAY_TRANSFORM
can be used to extract an attribute from an array of structs while keeping the array structure. Once having an array of primitive types, all other array functions can be used to process the array further. In this example, ARRAY_FILTER
is used to select keys that are bigger than 10.