Working with arrays
This section covers querying and manipulating arrays in Firebolt.
- Declaring ARRAY data types in table definitions
- Simple array functions
- Manipulating arrays with Lambda functions
- UNNEST
Declaring ARRAY data types in table definitions
Array types are declared using ARRAY(<type>)
where <type>
can be any data type that Firebolt supports. This includes the ARRAY
data type, so arrays can be arbitrarily nested. The innermost type (the scalar) can be nullable, but ARRAY
-typed columns cannot be nullable.
Array literals are also supported. For example, the SELECT
statement shown below is valid.
SELECT [1,2,3,4]
Basis for examples
All examples in this topic are based on the table below, named visits
. The column id
is of type INT
. All other columns are of type ARRAY(TEXT)
.
+----+--------------------------+-------------------------------------+------------------------------------------------+
| id | tags | agent_props_keys | agent_props_vals |
+----+--------------------------+-------------------------------------+------------------------------------------------+
| 1 | ["summer-sale","sports"] | ["agent", "platform", "resolution"] | ["Mozilla/5.0", "Windows NT 6.1", "1024x4069"] |
| 2 | ["gadgets","audio"] | ["agent", "platform"] | ["Safari", "iOS 14"] |
+----+--------------------------+-------------------------------------+------------------------------------------------+
Simple array functions
There are several fundamental functions that you can use to work with arrays, including LENGTH, ARRAY_CONCAT, and FLATTEN. See the respective reference for a full description. Brief examples are shown below.
SELECT LENGTH(agent_prop_keys)
FROM visits
Returns: 3,2
SELECT ARRAY_CONCAT(agent_props_keys, agent_props_vals)
FROM visits
Returns: ["agent", "platform", "resolution", "Mozilla/5.0", "Windows NT 6.1", "1024x4069"]
SELECT FLATTEN([ [[1,2,3],[4,5]], [[2]] ])
Returns: [1,2,3,4,5,2]
Manipulating arrays with Lambda functions
Firebolt Lambda functions are a powerful tool that you can use on arrays to extract results. This subset of array functions uses Firebolt Lambda expressions to operate on each element of one or more arrays. Arrays and the operation to perform are specified as arguments to the Lambda expression. Lambda expressions have the general syntax shown below. For details of each expression and operation, see the reference topic for each function listed under Lambda functions.
Lambda expression general syntax
The general syntax pattern of Lambda expressions shown below is uniform across the array functions that use them. For syntax and examples of each expression used in the context of the Lambda function, see the function’s reference topic.
<ARRAY_FUNC>(<arr1_var>[, <arr2_var>][, ...<arrN_var>]) -> <operation>, <array1>[, <array2>][, ...<arrayN>])
Parameter | Description |
---|---|
<ARRAY_FUNC> | Any array function that accepts a Lambda expression as an argument. For a list, see Lambda functions. |
<arr1_var>[, <arr2_var>][, ...<arrN_var>] | A list of one or more variables that you specify. The list is specified in the same order and must be the same length as the list of array expressions ( | |
The operation that is performed for each element of the array. This is typically a function or comparison. | |
A comma-separated list of expressions, each of which evaluates to an ARRAY data type. |
Lambda function example–single array
Consider the following TRANSFORM array function that uses a single array variable and reference in the Lambda expression.
SELECT TRANSFORM(t -> UPPER(t), tags) as up_tags
FROM visits
The TRANSFORM uses the Lambda expression to apply UPPER
to each element t
in each array found in the ARRAY
-typed column tags
. This converts each element in each array in tags
to upper-case, as shown in the result example below.
+----------------------------+
| up_tags |
+----------------------------+
| ["SUMMER_SALE", "SPORTS"] |
| ["GADGETS", "AUDIO"] |
+----------------------------+
Lambda function example–multiple arrays
A common use case where you provide multiple array arguments is when one array represents the keys and the other represents the values in a map of key-value pairs.
To extract the value associated with a particular key, you can use the ARRAY_FIRST function, which returns the first element for which the Lambda expression evaluates to a result that is true (non-0
). The value returned always corresponds to the first array argument in the series. However, the Lambda expression operation can evaluate its comparison using any array argument.
In the example below, we want to return the value in agent_props_vals
where the corresponding position in the agent_props_keys
array contains the value platform
.
SELECT ARRAY_FIRST(v, k -> k = 'platform', agent_props_vals, agent_props_keys)
as platform
FROM visits
Returns:
+------------------+
| platform |
+------------------+
| "Windows NT 6.1" |
| "iOS 14" |
+------------------+
UNNEST
You might want to transform a nested array structure to a standard tabular format so that you can expose views to BI tools that can’t handle Firebolt array syntax, or you might find the tabular format more natural to query using standard SQL idioms. UNNEST
serves these purposes.
UNNEST is part of the FROM clause and resembles a JOIN. Given an ARRAY
-typed column, UNNEST
unfolds the elements of the array and duplicates all other columns found in the SELECT
clause for each array element.
A single UNNEST
acts similarly to JOIN
. You can use a single UNNEST
command to unnest several arrays if the arrays are the same length.
Multiple UNNEST
statements in a single FROM
clause result in a Cartesian product. Each element in the first array has a record in the result set corresponding to each element in the second array.
Example–single UNNEST with single ARRAY-typed column
The following example unnests the tags
column from the visits
table.
SELECT id, tags
FROM visits
UNNEST(tags)
Returns:
+----+---------------+
| id | tags |
+----+---------------+
| 1 | "summer-sale" |
| 1 | "sports" |
| 2 | "gadgets" |
| 2 | "audio" |
+----+---------------+
Example–single UNNEST using multiple ARRAY-typed columns
The following query specifies both the agent_props_keys
and agent_props_vals
columns to unnest.
SELECT
id,
a_keys,
a_vals
FROM
visits
UNNEST(agent_props_keys as a_keys,
agent_props_vals as a_vals)
Returns:
id | a_keys | a_vals |
---|---|---|
1 | agent | “Mozilla/5.0” |
1 | platform | “Windows NT 6.1” |
1 | resolution | “1024x4069” |
2 | agent | “Safari” |
2 | platform | “iOS 14” |
Example–multiple UNNEST clauses resulting in a Cartesian product
The following query, while valid, creates a Cartesian product.
SELECT
id,
a_keys,
a_vals
FROM
visits
UNNEST(agent_props_keys as a_keys)
UNNEST(agent_props_vals as a_vals)
Returns:
+-----+------------+------------------+
| INT | a_keys | a_values |
+-----+------------+------------------+
| 1 | agent | "Mozilla/5.0" |
| 1 | agent | "Windows NT 6.1" |
| 1 | agent | "1024x4069" |
| 1 | platform | "Mozilla/5.0" |
| 1 | platform | "Windows NT 6.1" |
| 1 | platform | "1024x4069" |
| 1 | resolution | "Mozilla/5.0" |
| 1 | resolution | "Windows NT 6.1" |
| 1 | resolution | "1024x4069" |
| 2 | agent | "Safari" |
| 2 | agent | "iOS 14" |
| 2 | platform | "Safari" |
| 2 | platform | "iOS 14" |
+-----+------------+------------------+
Example–error on UNNEST of multiple arrays with different lengths
The following query is invalid and will result in an error as the tags
and agent_props_keys
arrays have different lengths for row 1.
SELECT
id,
tags,
a_keys
FROM
visits
UNNEST(tags,
agent_props_keys as a_keys)