Firebolt’s STRUCT data type is currently in public preview. We are gathering feedback and further refining this feature.

STRUCT data type

This topic describes the Firebolt implementation of the STRUCT data type.

Overview

The struct data type is a composite type that allows you to group multiple attributes of varying data types into a single logical unit. Common in modern databases, structs enable flexible and hierarchical data modeling by representing related data as a single object with named attributes. This is especially useful for semi-structured or nested data, allowing for clearer organization, better readability, and simplified queries when working with complex data relationships.

Defining struct types

A struct type is defined using the STRUCT(...) syntax:

STRUCT (
    s_key INT,
    value TEXT
)

For example, to create a table with a struct column:

CREATE TABLE t (
    id  BIGINT,
    kv  STRUCT (
            s_key INT,
            value TEXT
        )
    ts  TIMESTAMP
);

Nullability constraint

A nullability constraint can be specified to a struct column but not any struct attribute. This means struct attributes are always nullable, even if the enclosing struct is NOT NULL.

For example, defining the kv column to be NOT NULL is OK. Defining the s_key or value attribute to be NOT NULL is not supported.

CREATE TABLE t (
    id  BIGINT,
    kv  STRUCT (
            s_key INT  NOT NULL,  -- not supported, returns an error
            value TEXT NOT NULL   -- not supported, returns an error
        ) NOT NULL                -- OK
    ts  TIMESTAMP
);

In this example, column kv cannot store NULL values for the whole struct. But a struct value where both s_key and value attributes are NULL is valid.

Although specifying the NOT NULL constraint to a struct attribute yields an error, declaring a struct attribute as NULL explicitly is accepted. It is redundant but may be used to improve readability.

Struct literals

Creating struct literals is not yet supported. For information about how to ingest data of struct types, see Ingesting struct values.

Partition on struct or struct attributes

Partitioning a table on struct columns or struct attributes is not yet supported, but will be released in the near future.

Ingesting struct values

Struct values can be ingested into a table by using the COPY FROM statement. For example:

COPY INTO t FROM 's3://path/to/file.parquet'
WITH TYPE=parquet SUPPORT_STRUCTS=TRUE;
  • When SUPPORT_STRUCTS=TRUE: Nested structures in the source data will be inferred as STRUCT data types.
  • When SUPPORT_STRUCTS=FALSE: Nested structures in the source data will be inferred as individual shredded columns.

The SUPPORT_STRUCTS parameter only affects type inference during schema discovery. With predefined table schemas, the system maps incoming data to your schema regardless of this setting. For more details, see the COPY FROM documentation.

The automatic schema discovery feature of the COPY FROM statement supports the STRUCT data type for Parquet files. Support of other file formats will be added in the near future.

Other utilities like READ_PARQUET and external tables do not support the STRUCT data type at the moment. We will extend the coverage in the following releases.

Querying structs

In Firebolt, each attribute of a struct is stored as an individual column. When processing struct values, the system never packs attributes of a struct into a binary format. Instead, it always works on a list of attribute columns directly. The query optimizer applies column pruning techniques on struct attributes, too. This means Firebolt only accesses struct attributes that are strictly needed for answering a query.

Accessing attributes

A struct attribute can be accessed by using the dot syntax:

SELECT kv.value FROM t WHERE kv.s_key > 10;

Example output:

value TEXT
'test_value_1'
'test_value_2'

When selecting a specific attribute of a struct, the resulting column will be named exactly after that attribute. In the previous code example, this column is just value, without adding the struct name or any extra formatting to the column name.

In addition, the PostgreSQL syntax for accessing a field of a composite type is also supported:

-- PostgreSQL syntax
SELECT (kv).value FROM t WHERE (kv).s_key > 10;

There is a subtle difference in the two syntaxes in name resolution. When t.a can be resolved to either a column a of table t or an attribute a of a struct t, the Firebolt syntax t.a prefers the table column over the struct attribute. On the other hand, the PostgreSQL syntax (t).a prefers the struct attribute over the table column.

When there is no such ambiguity, both syntaxes resolve to the same object and they can be used interchangeably.

Type conversion

At the moment, a STRUCT value cannot be converted to or from any other non-struct data type.

A value of a struct type S1 can be converted into another struct type S2 only if both of the following hold:

  • S1 and S2 have exactly the same structure. Attributes can have different names in S1 and S2, but they must form a bijection.
  • For each attribute p in S1 and its image attribute q in S2, there exists a type conversion from p’s data type to q’s data type.

Grouping and ordering

Struct columns can be used in GROUP BY and ORDER BY clauses of a SELECT statement:

SELECT MAX(ts) FROM t GROUP BY kv;

SELECT * FROM t ORDER BY kv;

When ordering by a struct column, the query optimizer decomposes it to order by all struct attributes. The order of the attributes is the order of definition. In the example above, ORDER BY kv means ordering by "s_key" attribute first, and then "value".

Understanding EXPLAIN output with structs

Because struct values are processed through columnar decomposition, it’s important to consider how struct attributes appear as separate columns, how nullability is tracked with marker columns, and how nodes like Projection and Sort operate on decomposed fields. These details are especially relevant when interpreting the EXPLAIN output for queries on STRUCT columns.

Let us take a look at the EXPLAIN output of the last example query above:

EXPLAIN
SELECT * FROM t ORDER BY kv;

[0] [Projection] t.id, struct(t.kv.$not_null$, t.kv.s_key, t.kv.value), t.ts
 \_[1] [Sort] OrderBy: [t.kv.$not_null$ Ascending Last, t.kv.s_key Ascending Last, t.kv.value Ascending Last]
    \_[2] [StoredTable] Name: "t", used 5/5 column(s) FACT

Firstly, note that the StoredTable node [2] shows “5/5 column(s)”. Recall that struct values are decomposed into individual attribute columns. So table t is actaully decomposed into the following schema:

TABLE t (
    id            BIGINT,
    kv.$not_null$ BOOLEAN NOT NULL,
    kv.s_key      INT,
    kv.value      TEXT
    ts            TIMESTAMP
)

Other than the two attributes kv.s_key and kv.value, there is a third column kv.$not_null$ that denotes whether the whole struct value is NULL. This column exists only if the struct is defined as nullable.

The Sort node [1] directly orders tuples based on the decomposed struct attributes, including the $not_null$ marker column.

The top Projection node [0] constructs struct values for the kv column, as required by the query. However, instead of packing these values into actual struct objects, Firebolt serializes the result directly using the list of attribute columns.

Limitations

At the moment, built-in functions and operators generally do not accept struct values, including comparison operators. Use struct attributes in queries wherever possible. More functions and operators supporting struct values will be added in the near future.

Type composition

Firebolt supports both ARRAY and STRUCT composite types. They compose freely, so you can model data as arrays of structs, structs of arrays, or more complicated shapes.

Nested struct

A struct attribute can have any type that is supported in Firebolt, including the STRUCT type itself. This means nested STRUCT types are allowed. For example:

CREATE TABLE t2 (
    id  BIGINT,
    kv  STRUCT (
            s_key INT,
            value STRUCT (
                      parent      INT,
                      description TEXT
                  ) NOT NULL            -- not supported
        ) NOT NULL                      -- OK
    ts  TIMESTAMP
);

A nested struct is an attribute of its parent struct. Defining a nullability constraint on a struct attribute is not supported. As a result, nested structs are always nullable.

Array of struct

The STRUCT type composes with the ARRAY type as well. It is possible to define an array of struct:

CREATE TABLE t3 (
    id  BIGINT,
    kvs ARRAY(
            STRUCT (
                s_key INT,
                value TEXT
            ) NOT NULL       -- not supported
        ) NOT NULL           -- OK
    ts  TIMESTAMP
);

Firebolt does not support defining a nullability contraint for an array element when the array element is a STRUCT or an array of STRUCT. As a result, for an array of struct column, only the top-level array can be NOT NULL. All nested element types are nullable by default.

At the moment, array functions and lambda functions generally do not support array of struct. The support will be extended in the near future.

Firebolt supports using array subscription ([] syntax) to access an array element on an array of STRUCT values.

When accessing an attribute of a struct in an array, the combined syntax arr[1].a does not work at the moment. Use the PostgreSQL syntax instead: (arr[1]).a. The issue will be fixed in the near future.

Unnesting arrays of structs

Unnesting an array of structs is just the same as unnesting an array of any other type:

SELECT id, UNNEST(kvs) AS kv FROM t3;

id BIGINT, kv struct(s_key int null, value text null)
1,{s_key:1,value:'test_value_1'}
1,{s_key:2,value:'test_value_2'}

The result of unnesting an array of structs column is a struct column. This behavior is consistent with unnesting an array of any other type.

Unnesting multiple arrays of structs together is supported, too. For more details on the array unnesting syntax, see UNNEST.

Aggregating structs into an array of struct

Some aggregate functions support structs. In particular, ARRAY_AGG can be useful when working on array of struct, together with UNNEST.

WITH unnested_structs AS (
    SELECT id, UNNEST(kvs) AS kv FROM t3
)
SELECT ARRAY_AGG(kv)
FROM unnested_structs
WHERE kv.s_key > 10
GROUP BY id;

The example above shows a way of selecting all structs with s_key > 10 from an array of struct column and preserves the array structure.

Using UNNEST and ARRAY_AGG together provides a fundamental way of working with arrays of struct even if there is no array or lambda function available for the operation in need.

Transforming arrays of structs

The example query above can be simplified by using the ARRAY_TRANSFORM function.

SELECT ARRAY_FILTER(y -> y > 10, ARRAY_TRANSFORM(x -> x.s_key, kvs)) FROM t3;

ARRAY_TRANSFORM can be used to extract an attribute from an array of structs while keeping the array structure. Once having an array of primitive types, all other array functions can be used to process the array further. In this example, ARRAY_FILTER is used to select keys that are bigger than 10.