Work with arrays
This section covers querying and manipulating arrays in Firebolt.
- Declaring ARRAY data types in table definitions
- Simple array functions
- Manipulating arrays with Lambda functions
- UNNEST
- ARRAY input and output syntax
Declaring ARRAY data types in table definitions
Array types are declared using ARRAY(<type>)
where <type>
can be any data type that Firebolt supports. This includes the ARRAY
data type, so arrays can be arbitrarily nested.
If you load an array from a CSV file, the arrays in the CSV file must be enclosed in double quotes (""
).
For example, if a CSV file contains a row containing value1 , value2 , "[array_value3 , array_value4]"
, you can create a table using the following code to read array_value3
and array_value4
into array_values
.
CREATE TABLE IF NOT EXISTS array_example
(
value1 STRING,
value2 STRING,
array_values ARRAY(TEXT)
)
URL = 's3://path_to_your_data/'
TYPE = (csv);
Array literals are also supported. For example, the SELECT
statement shown below is valid.
SELECT [1,2,3,4]
Basis for examples
All examples in this topic are based on the table below, named visits
. The column id
is of type INTEGER
. All other columns are of type ARRAY(TEXT)
.
Simple array functions
There are several fundamental functions that you can use to work with arrays, including ARRAY_LENGTH, ARRAY_CONCAT, and ARRAY_FLATTEN. See the respective reference for a full description. Brief examples are shown below.
LENGTH example
LENGTH
returns the number of elements in an array.
SELECT
id,
LENGTH(agent_props_keys) as key_array_length
FROM visits;
Returns:
+-----------------------+
| id | key_array_length |
+-----------------------+
| 1 | 3 |
| 2 | 2 |
| 3 | 3 |
+-----------------------+
ARRAY_CONCAT example
ARRAY_CONCAT
combines multiple arrays into a single array.
SELECT
id,
ARRAY_CONCAT(agent_props_keys, agent_props_vals) as concat_keys_and_vals
FROM visits;
Returns:
+----+------------------------------------------------------------------------------+
| id | concat_keys_and_vals |
+----+------------------------------------------------------------------------------+
| 1 | ["agent","platform","resolution","Mozilla/5.0","Windows NT 6.1","1024X4069"] |
| 2 | ["agent","platform","Safari","iOS 14"] |
| 3 | ["agent","platform","platform","Safari","iOS 14","Windows 11"] |
+----+------------------------------------------------------------------------------+
ARRAY_FLATTEN example
ARRAY_FLATTEN
converts an ARRAY of ARRAYs into a single flattened ARRAY. Note that this operation flattens only one level of the array hierarchy.
SELECT ARRAY_FLATTEN([ [[1,2,3],[4,5]], [[2]] ]) as flattened_array;
Returns:
+---------------------+
| flattened_array |
+---------------------+
| [[1,2,3],[4,5],[2]] |
+---------------------+
Manipulating arrays with Lambda functions
Firebolt Lambda functions are a powerful tool that you can use on arrays to extract results. Lambda functions iteratively perform an operation on each element of one or more arrays. Arrays and the operation to perform are specified as arguments to the Lambda function.
Lambda function general syntax
The general syntax pattern of a Lambda function is shown below. For detailed syntax and examples see the reference topics for Lambda functions.
<LAMBDA_FUNC>(<arr1_var>[, <arr2_var>][, ...<arrN_var>]) -> <operation>, <array1>[, <array2>][, ...<arrayN>])
Parameter | Description |
---|---|
<LAMBDA_FUNC> | Any array function that accepts a Lambda expression as an argument. For a list, see Lambda functions. |
<arr1_var>[, <arr2_var>][, ...<arrN_var>] | A list of one or more variables that you specify. The list is specified in the same order and must be the same length as the list of array expressions (<array1>[, <array2>][, ...<arrayN>] ). At runtime, each variable contains an element of the corresponding array. The specified <operation> is performed for each variable. |
The operation that is performed for each element of the array. This is typically a function or Boolean expression. | |
A comma-separated list of expressions, each of which evaluates to an ARRAY data type. |
Lambda function example–single array
Consider the following TRANSFORM array function that uses a single array variable and reference in the Lambda expression. This example applies the UPPER
function to each element t
in the ARRAY
-typed column tags
. This converts each element in each tags
array to upper-case.
SELECT
id,
TRANSFORM(t -> UPPER(t), tags) AS up_tags
FROM visits;
Returns:
+----+--------------------------+
| id | up_tags |
+----+--------------------------+
| 1 | ["SUMMER-SALE","SPORTS"] |
| 2 | ["GADGETS","AUDIO"] |
| 3 | ["SUMMER-SALE","AUDIO"] |
+----+--------------------------+
Lambda function example–multiple arrays
ARRAY_FIRST is an example of a function that takes multiple arrays as arguments in a map of key-value pairs. One array represents the keys and the other represents the values.
ARRAY_FIRST
uses a Boolean expression that you specify to find the key in the key array. If the Boolean expression resolves to true, the function returns the first value in the value array that corresponds to the key’s element position. If there are duplicate keys, only the first corresponding value is returned.
The example below returns the first value in the agent_props_vals
array where the corresponding position in the agent_props_keys
array contains the key platform
.
SELECT
id,
ARRAY_FIRST(v, k -> k = 'platform', agent_props_vals, agent_props_keys) AS platform
FROM visits;
Returns:
+----+----------------+
| id | platform |
+----+----------------+
| 1 | Windows NT 6.1 |
| 2 | iOS 14 |
| 3 | iOS 14 |
+----+----------------+
ARRAY_SORT sorts one array by another. One array represents the values and the other represents the sort order.
The example below sorts the first array by the positions defined in the second array
SELECT
ARRAY_SORT(x,y -> y, [ 'A','B','C'],[3,2,1]) AS res;
Returns:
+-----------------+
| res |
+-----------------+
| ["C", "B", "A"] |
+-----------------+
UNNEST
You might want to transform a nested array structure to a standard tabular format. UNNEST
serves this purpose.
UNNEST is a table-valued function (TVF) that transforms an input row containing an array into a set of rows. UNNEST
unfolds the elements of the array and duplicates all other columns found in the SELECT
clause for each array element. If the input array is empty, the corresponding row is eliminated.
You can use a single UNNEST
command to unnest several arrays if the arrays are the same length.
Multiple UNNEST
statements in a single FROM
clause result in a Cartesian product. Each element in the first array has a record in the result set corresponding to each element in the second array.
Example–single UNNEST with single ARRAY-typed column
The following example unnests the tags
column from the visits
table.
SELECT
id,
tag
FROM
visits,
UNNEST(tags) as r(tag);
Returns:
+----+---------------+
| id | tag |
+----+---------------+
| 1 | "summer-sale" |
| 1 | "sports" |
| 2 | "gadgets" |
| 2 | "audio" |
+----+---------------+
Example–single UNNEST using multiple ARRAY-typed columns
The following query specifies both the agent_props_keys
and agent_props_vals
columns to unnest.
SELECT
id,
a_key,
a_val
FROM
visits,
UNNEST(agent_props_keys, agent_props_vals) as r(a_key, a_val);
Returns:
+----+------------+------------------+
| id | a_key | a_val |
+----+------------+------------------+
| 1 | agent | “Mozilla/5.0” |
| 1 | platform | “Windows NT 6.1” |
| 1 | resolution | “1024x4069” |
| 2 | agent | “Safari” |
| 2 | platform | “iOS 14” |
+----+------------+------------------+
Example–multiple UNNEST clauses resulting in a Cartesian product
The following query, while valid, creates a Cartesian product.
SELECT
id,
a_key,
a_val
FROM
visits,
UNNEST(agent_props_keys as a_keys) as r1(a_key),
UNNEST(agent_props_vals as a_vals) as r2(a_val);
Returns:
+-----+------------+------------------+
| id | a_key | a_val |
+-----+------------+------------------+
| 1 | agent | "Mozilla/5.0" |
| 1 | agent | "Windows NT 6.1" |
| 1 | agent | "1024x4069" |
| 1 | platform | "Mozilla/5.0" |
| 1 | platform | "Windows NT 6.1" |
| 1 | platform | "1024x4069" |
| 1 | resolution | "Mozilla/5.0" |
| 1 | resolution | "Windows NT 6.1" |
| 1 | resolution | "1024x4069" |
| 2 | agent | "Safari" |
| 2 | agent | "iOS 14" |
| 2 | platform | "Safari" |
| 2 | platform | "iOS 14" |
+-----+------------+------------------+
Example–error on UNNEST of multiple arrays with different lengths
The following query is invalid and will result in an error as the tags
and agent_props_keys
arrays have different lengths for row 1.
SELECT
id,
tag,
a_key
FROM
visits,
UNNEST(tags, agent_props_keys) as r(tag, a_key);
ARRAY input and output syntax
ARRAY
values can be converted from and to TEXT
. This happens, for example, when an explicit CAST
is added to a query, or when ARRAY
values are (de-)serialized in a COPY
statement.
Converting ARRAY to TEXT
Broadly, the TEXT
representation of an ARRAY
value starts with an opening curly brace ({
). This is followed by the TEXT
representations of the individual array elements separated by commas (,
). It ends with a closing curly brace (}
). NULL
array elements are represented by the literal string NULL
. For example, the query
SELECT
CAST([1,2,3,4,NULL] AS TEXT)
returns the TEXT
value '{1,2,3,4,NULL}'
.
When converting ARRAY
values containing TEXT
elements to TEXT
, some additional rules apply. Specifically, array elements are enclosed by double quotes ("
) in the following cases:
- The array element is an empty string.
- The array element contains curly or square braces (
{
,[
,]
,}
), commas (,
), double quotes ("
), backslashes (\
), or white space. - The array element matches the word
NULL
(case-insensitively).
Additionally, double quotes and backslashes embedded in array elements will be backslash-escaped. For example, the query
SELECT
CAST(['1','2','3','4',NULL,'','{impostor,array}','["impostor","array","back\slash"]',' padded and spaced ', 'only spaced', 'null'] AS TEXT)
returns the TEXT
value '{1,2,3,4,NULL,"","{impostor,array}","[\"impostor\",\"array\",\"back\\slash\"]"," padded and spaced ","only spaced","null"}'
.
Converting TEXT to ARRAY
When converting the TEXT
representation of an array back to ARRAY
, the same quoting and escaping rules as above apply. Unquoted whitespace surrounding array elements is trimmed, but whitespace contained within array elements is preserved. The array elements themselves are converted according to the conversion rules for the requested array element type. For example, the query
SELECT
CAST('{1, 2, 3, 4, null, "", "{impostor,array}", "[\"impostor\",\"array\",\"back\\slash\"]", " padded and spaced ", "null", unquoted padded and spaced }' AS ARRAY(TEXT))
returns the ARRAY(TEXT)
value [1,2,3,4,NULL,'','{impostor,array}','["impostor","array","back\slash"]',' padded and spaced ','null','unquoted padded and spaced']
.
It is also possible to enclose arrays with square braces ([
and ]
) instead of curly braces ({
and }
) when converting from TEXT
to ARRAY
. For example, the query
SELECT
CAST('[1, 2, 3, 4, NULL]' AS ARRAY(INTEGER))
returns the ARRAY(INTEGER)
value [1,2,3,4,NULL]
.
Nested ARRAYs
Finally, the same prodedure applies when converting nested ARRAY
values from and to TEXT
. For example, the query
SELECT
CAST([NULL,[],[NULL],[1,2],[3,4]] AS TEXT)
returns the TEXT
value {NULL,{},{1,2},{3,4}}
, and the query
SELECT
CAST('{NULL,{},{1,2},{3,4}}' AS ARRAY(ARRAY(INTEGER)))
returns the ARRAY(ARRAY(INTEGER))
value [NULL,[],[NULL],[1,2],[3,4]]
.