Firebolt’s JSON data type is currently in public preview. We are gathering feedback and further refining this feature.
Overview
TheJSON data type stores semi-structured JSON documents in a single logical column. It is optimized in such a way that JSON data can be processed as efficiently as possible, especially when only some parts of the stored JSON documents are relevant for a specific query.
Working with JSON
Casting
Casting fromTEXT to JSON is the only supported CAST involving the JSON data type, interpreting the textual input as outlined below.
The to_json function can be used to convert arbitrary types to JSON, but unlike casting it will not parse textual input. Consider the following example illustrating this difference:
Extracting Individual Fields
Individual fields of aJSON value can be extracted by using any combination of the dot (.) and subscript ([]) operators. Consider the following examples:
CAST. Continuing the example above, consider the following queries:
Comparisons
Values of typeJSON cannot be compared to each other. As a result, none of the following operations are supported:
- Passing
JSONvalues to any function or operator that requires its arguments to be comparable. In particular, this includes any comparison functions or operators, and any aggregate functions likeMIN,MAX, etc. - Including
JSONcolumns in thePRIMARY INDEXorPARTITION BYclauses of aCREATE TABLEstatement. - Including
JSONcolumns in aJOIN,GROUP BY, orORDER BYkey. - Including
JSONcolumns in aSELECT DISTINCTorUNIONclause. - Including
JSONcolumns in theORDER BYorPARTITION BYclauses of a window function. - Computing the hash of a
JSONvalue.
JSON sub-values, for example ARRAYs or STRUCTs containing JSON.
Textual Input
Firebolt supports creatingJSON values from any UTF-8 encoded string that contains a valid JSON document that can be parsed by the SimdJSON library. When creating a JSON value in this way, the input string is parsed eagerly exactly once and converted to an internal binary representation of the respective JSON value. Subsequent operations on JSON values can be performed directly on this binary representation without having to re-parse the input string, making them considerably faster than JSON functions that operate on TEXT input data.
The internal binary representation does not store an exact copy of the original input string itself. As a result, JSON values do not preserve any white space that is present in the original input string. Similarly, JSON values do not preserve the exact format of numbers (e.g. 3e3 becomes 3000) or escape sequences in strings (e.g. "\u00e1" becomes "á"). Finally, they do not preserve the order of object keys and do not preserve duplicate object keys. If there are duplicate object keys in the input, only the last value is retained.
The parser used in the JSON data type implementation is fully compliant with the RFC 8259 JSON specification. Nevertheless, the JSON data type imposes some minor restrictions on what constitutes valid textual JSON input that are allowed under RFC 8259. Note that these restrictions also apply to all JSON functions operating directly on TEXT input data.
- Only space, horizontal tab, line feed and carriage return are allowed as whitespace characters.
- Numbers are fully validated according to the JSON specification. For example, the string
01will be rejected since leading zeros are not allowed. - Numbers not representable as 64-bit signed integers, 64-bit unsigned integers, or 64-bit floating point numbers are rejected.
- Strings must not contain unescaped control characters. For example, strings containing unescaped line breaks or tabs are rejected. Note that whitespace outside of strings does not have to be escaped.
- Textual JSON input must be encoded as UTF-8 and is fully validated during parsing. Documents starting with a byte order mark are rejected.
- Documents larger than 4294967295 bytes or documents whose binary representation is larger than 4294967295 bytes are rejected.
- Documents with a nesting depth larger than 1024 are rejected.
TEXT data type.
- Strings containing either escaped or unescaped NULL characters (
"\u0000") are rejected.
Import & Export
Firebolt supports importing or exporting JSON data in various different data formats, most of which support theJSON data type.
Avro
Avro files can be imported into Firebolt through the read_avro table-valued function, or external tables withTYPE = (AVRO). Since Avro does not natively provide a true semi-structured data type, there are the following primary ways to import Avro data into a JSON column:
- Explicitly defining the type of an external table column as
JSON. This is supported for Avro string columns containing valid UTF-8 encoded JSON strings, and parsing is performed as outlined above. - Converting a column with inferred SQL type
TEXTtoJSONthrough the CAST or TRY_CAST functions. This is supported for Avro string columns containing valid UTF-8 encoded JSON strings, and parsing is performed as outlined above. There is currently no support for automatically inferring the type of such string columns asJSON. - Converting any column to
JSONthrough the to_json function. Note that this will not parse any JSON strings in the input, unlike casting.
CSV
CSV files can be imported into Firebolt through COPY FROM, the read_csv table-valued function, or external tables withTYPE = (CSV). Since CSV does not natively provide a true semi-structured data type, there are the following primary ways to import CSV data into a JSON column:
- Explicitly defining the type of an external table column as
JSON, or usingCOPY FROMwith a pre-existing table containingJSONcolumns. This is supported for CSV columns containing valid UTF-8 encoded JSON strings, and parsing is performed as outlined above. - Converting a column with inferred SQL type
TEXTtoJSONthrough the CAST or TRY_CAST functions. This is supported for CSV columns containing valid UTF-8 encoded JSON strings, and parsing is performed as outlined above. There is currently no support for automatically inferring the type of such columns asJSON. - Converting any column to
JSONthrough the to_json function. Note that this will not parse any JSON strings in the input, unlike casting.
TYPE = CSV. Any JSON values in the source table will be written as UTF-8 encoded JSON strings.
JSON
JSON files can be imported into Firebolt through external tables withTYPE = (JSON). There are the following primary ways to import JSON data into a JSON column:
- Explicitly defining the type of an external table column as
JSON. In this case, the corresponding input JSON values are parsed as outlined above. - Converting a column with inferred SQL type
TEXTtoJSONthrough the CAST or TRY_CAST functions. This is supported for input JSON strings containing valid and properly escaped UTF-8 encoded JSON strings (e.g."{\"a\":\"b\"}"), and parsing is performed as outlined above. There is currently no support for automatically inferring the type of such columns asJSON. - Converting any column to
JSONthrough the to_json function. Note that this will not parse any JSON strings in the input, unlike casting. - Reading each line in the input as a single
JSONvalue. This can be achieved by creating an external table with a singleTEXTcolumn andTYPE = (JSON PARSE_AS_TEXT = TRUE). Subsequently this column can be converted toJSONthrough the CAST or TRY_CAST functions.
TYPE = JSON. Any JSON values in the source table will be written as native JSON values to the output.
A minor edge case exists when exporting or importing nullable JSON columns. Specifically, Firebolt distinguishes between SQL NULL values and scalar JSON null values stored in a JSON column. However, this distinction will be lost when exporting a JSON column to a JSON file, where both of these values will be represented by a JSON null value. Conversely, when importing a JSON file into a nullable JSON column, all scalar JSON null values will be interpreted as SQL NULL. Consider the following example:
Parquet
Parquet files can be imported into Firebolt through COPY FROM, the read_parquet table-valued function, or external tables withTYPE = (PARQUET). Unlike the other supported file formats described above, Parquet defines a dedicated JSON logical type which is automatically inferred as JSON in Firebolt by COPY FROM and read_parquet. This provides a seamless approach for importing JSON data into Firebolt from Parquet. In addition, the following alternatives are also supported when the underlying Parquet type of a column is not JSON:
- Explicitly defining the type of an external table column as
JSON, or usingCOPY FROMwith a pre-existing table containingJSONcolumns. This is supported for Parquet string columns containing valid UTF-8 encoded JSON strings, and parsing is performed as outlined above. - Converting a column with inferred SQL type
TEXTtoJSONthrough the CAST or TRY_CAST functions. This is supported for Parquet string columns containing valid UTF-8 encoded JSON strings, and parsing is performed as outlined above. There is currently no support for automatically inferring the type of such string columns asJSON. - Converting any column to
JSONthrough the to_json function. Note that this will not parse any JSON strings in the input, unlike casting.
TEXT columns, in which case the input JSON string will be imported unmodified.
Data can be exported from Firebolt in Parquet format through COPY TO with TYPE = PARQUET. Here, JSON columns are always written as Parquet columns with JSON logical type.
Driver Version Requirements
The following minimum driver versions are required to use theJSON data type: