Link Search Menu Expand Document

Working with arrays

This section covers querying and manipulating arrays in Firebolt.

Declaring ARRAY data types in table definitions

Array types are declared using ARRAY(<type>) where <type> can be any data type that Firebolt supports. This includes the ARRAY data type, so arrays can be arbitrarily nested. The innermost type (the scalar) can be nullable, but ARRAY-typed columns cannot be nullable.

Array literals are also supported. For example, the SELECT statement shown below is valid.

SELECT [1,2,3,4]

Basis for examples

All examples in this topic are based on the table below, named visits. The column id is of type INT. All other columns are of type ARRAY(TEXT).

+----+--------------------------+-------------------------------------+------------------------------------------------+
| id |           tags           |          agent_props_keys           |                agent_props_vals                |
+----+--------------------------+-------------------------------------+------------------------------------------------+
|  1 | ["summer-sale","sports"] | ["agent", "platform", "resolution"] | ["Mozilla/5.0", "Windows NT 6.1", "1024x4069"] |
|  2 | ["gadgets","audio"]      | ["agent", "platform"]               | ["Safari", "iOS 14"]                           |
+----+--------------------------+-------------------------------------+------------------------------------------------+

Simple array functions

There are several fundamental functions that you can use to work with arrays, including LENGTH, ARRAY_CONCAT, and FLATTEN. See the respective reference for a full description. Brief examples are shown below.

SELECT LENGTH(agent_prop_keys)
FROM visits

Returns: 3,2

SELECT ARRAY_CONCAT(agent_props_keys, agent_props_vals)
FROM visits

Returns: ["agent", "platform", "resolution", "Mozilla/5.0", "Windows NT 6.1", "1024x4069"]

SELECT FLATTEN([ [[1,2,3],[4,5]], [[2]] ])

Returns: [1,2,3,4,5,2]

Manipulating arrays with Lambda functions

Firebolt Lambda functions are a powerful tool that you can use on arrays to extract results. This subset of array functions uses Firebolt Lambda expressions to operate on each element of one or more arrays. Arrays and the operation to perform are specified as arguments to the Lambda expression. Lambda expressions have the general syntax shown below. For details of each expression and operation, see the reference topic for each function listed under Lambda functions.

Lambda expression general syntax

The general syntax pattern of Lambda expressions shown below is uniform across the array functions that use them. For syntax and examples of each expression used in the context of the Lambda function, see the function’s reference topic.

<ARRAY_FUNC>(<arr1_var>[, <arr2_var>][, ...<arrN_var>]) -> <operation>, <array1>[, <array2>][, ...<arrayN>])
Parameter Description
<ARRAY_FUNC> Any array function that accepts a Lambda expression as an argument. For a list, see Lambda functions.
<arr1_var>[, <arr2_var>][, ...<arrN_var>] | A list of one or more variables that you specify. The list is specified in the same order and must be the same length as the list of array expressions ([, ][, ...]`). At run-time, each variable contains an element of the corresponding array, upon which the specified `` is performed.  
The operation that is performed for each element of the array. This is typically a function or comparison.
[, ][, ...] A comma-separated list of expressions, each of which evaluates to an ARRAY data type.

Lambda function example–single array

Consider the following TRANSFORM array function that uses a single array variable and reference in the Lambda expression.

SELECT TRANSFORM(t -> UPPER(t), tags) as up_tags
FROM visits

The TRANSFORM uses the Lambda expression to apply UPPER to each element t in each array found in the ARRAY-typed column tags. This converts each element in each array in tags to upper-case, as shown in the result example below.

+----------------------------+
|          up_tags           |
+----------------------------+
| ["SUMMER_SALE", "SPORTS"] |
| ["GADGETS", "AUDIO"]       |
+----------------------------+

Lambda function example–multiple arrays

A common use case where you provide multiple array arguments is when one array represents the keys and the other represents the values in a map of key-value pairs.

To extract the value associated with a particular key, you can use the ARRAY_FIRST function, which returns the first element for which the Lambda expression evaluates to a result that is true (non-0). The value returned always corresponds to the first array argument in the series. However, the Lambda expression operation can evaluate its comparison using any array argument.

In the example below, we want to return the value in agent_props_vals where the corresponding position in the agent_props_keys array contains the value platform.

SELECT ARRAY_FIRST(v, k -> k = 'platform', agent_props_vals, agent_props_keys)
as platform
FROM visits

Returns:

+------------------+
|     platform     |
+------------------+
| "Windows NT 6.1" |
| "iOS 14"         |
+------------------+

UNNEST

You might want to transform a nested array structure to a standard tabular format so that you can expose views to BI tools that can’t handle Firebolt array syntax, or you might find the tabular format more natural to query using standard SQL idioms. UNNEST serves these purposes.

UNNEST is part of the FROM clause and resembles a JOIN. Given an ARRAY-typed column, UNNEST unfolds the elements of the array and duplicates all other columns found in the SELECT clause for each array element.

A single UNNEST acts similarly to JOIN. You can use a single UNNEST command to unnest several arrays if the arrays are the same length.

Multiple UNNEST statements in a single FROM clause result in a Cartesian product. Each element in the first array has a record in the result set corresponding to each element in the second array.

Example–single UNNEST with single ARRAY-typed column

The following example unnests the tags column from the visits table.

SELECT id, tags
  FROM visits
UNNEST(tags)

Returns:

+----+---------------+
| id |     tags      |
+----+---------------+
|  1 | "summer-sale" |
|  1 | "sports"      |
|  2 | "gadgets"     |
|  2 | "audio"       |
+----+---------------+

Example–single UNNEST using multiple ARRAY-typed columns

The following query specifies both the agent_props_keys and agent_props_vals columns to unnest.

SELECT
    id,
    a_keys,
    a_vals
FROM
    visits
    UNNEST(agent_props_keys as a_keys,
           agent_props_vals as a_vals)

Returns:

id a_keys a_vals
1 agent “Mozilla/5.0”
1 platform “Windows NT 6.1”
1 resolution “1024x4069”
2 agent “Safari”
2 platform “iOS 14”

Example–multiple UNNEST clauses resulting in a Cartesian product

The following query, while valid, creates a Cartesian product.

SELECT
    id,
    a_keys,
    a_vals
FROM
    visits
UNNEST(agent_props_keys as a_keys)
UNNEST(agent_props_vals as a_vals)

Returns:

+-----+------------+------------------+
| INT |   a_keys   |     a_values     |
+-----+------------+------------------+
|   1 | agent      | "Mozilla/5.0"    |
|   1 | agent      | "Windows NT 6.1" |
|   1 | agent      | "1024x4069"      |
|   1 | platform   | "Mozilla/5.0"    |
|   1 | platform   | "Windows NT 6.1" |
|   1 | platform   | "1024x4069"      |
|   1 | resolution | "Mozilla/5.0"    |
|   1 | resolution | "Windows NT 6.1" |
|   1 | resolution | "1024x4069"      |
|   2 | agent      | "Safari"         |
|   2 | agent      | "iOS 14"         |
|   2 | platform   | "Safari"         |
|   2 | platform   | "iOS 14"         |
+-----+------------+------------------+

Example–error on UNNEST of multiple arrays with different lengths

The following query is invalid and will result in an error as the tags and agent_props_keys arrays have different lengths for row 1.

SELECT
    id,
    tags,
    a_keys
FROM
    visits
    UNNEST(tags,
           agent_props_keys as a_keys)