Release notes archive for the Firebolt data warehouse.
MERGE
SQL Statement
Added support for MERGE
SQL Statement, which allows users to perform multiple data modifications (INSERT
, UPDATE
, DELETE
) within a single transaction. MERGE
is useful for common database tasks like removing duplicates, syncing data between tables (upsert), or cleaning out old records. Documentation is available here.
Added COMMENT ON
statement
Added support for the COMMENT ON statement, which allows users to add or update descriptions for objects such as engine, location, database, table, and column.
Added the REVERSE(TEXT)
function
Added the REVERSE(TEXT)
function which returns the input string with all characters in reverse order. Documentation for the function is available here.
Added the SOUNDEX(TEXT)
function
Added the SOUNDEX(TEXT)
function which returns the phonetic sound of an input string, allowing comparison of words that sound similar but are spelled differently. Documentation for the function is available here.
Added support for cross-region Amazon S3 access
Added the ability to ingest and export data to Amazon S3 buckets located in different regions from your Firebolt engines. This update enhances flexibility in data management for users with geographically distributed data. Cross-region access is disabled by default and can be enabled on a per-query basis using the cross_region_request_mode
setting.
Added granule-level pruning information to EXPLAIN ANALYZE
via PRIMARY_INDEX_PRUNED_GRANULES
and PRIMARY_INDEX_TOTAL_GRANULES
fields
Added pruning information at the granule level to EXPLAIN ANALYZE
through the fields PRIMARY_INDEX_PRUNED_GRANULES
and PRIMARY_INDEX_TOTAL_GRANULES
. This provides insights into query optimization by displaying how effectively the primary index reduces data during query execution.
Added an optimizer_mode
setting.
Added a new setting that allows users to control the amount of work that the optimizer attempts to do.
Documentation for the setting is available here.
Added an enable_storage_statistics
setting.
Added a new setting that allows users to control whether statistics information obtained from storage metadata is used for cost-based decisions made by the query optimizer.
Documentation for the setting is available here.
READ_ICEBERG()
table-valued function, which allows reading from external Iceberg tables, and added Iceberg support to LOCATION
with SOURCE = ICEBERG
.
Documentation for the function is available here and documentation for the new type of location here.
Added the ICU_NORMALIZE
function to standardize text formats across locales
Added the ICU_NORMALIZE
function to process text based on a specific locale. This helps in standardizing text formats across different languages and regions, ensuring uniformity and compatibility in data outputs.
Documentation for the function is available here
Added the AGO(interval)
function for subtracting intervals from the current timestamp.
Added the AGO(interval)
function, which subtracts the specified interval from the current timestamp. This addition provides users with a convenient way to calculate past dates and times, enhancing time-based data analysis.
Documentation for the function is available here.
Added a named parameter INFER_SCHEMA
to the READ_CSV
function
Added a named parameter INFER_SCHEMA
to the READ_CSV function. When INFER_SCHEMA
is true, the function determines column data types instead of using TEXT
.
Extended Parquet data type support
Added support for reading the following data types from Parquet files:
fixed_size_binary
as BYTEAuuid
as BYTEAtime32
and time64
as TIMESTAMP with 1970-01-01
as the date component. Note that time64
with nanosecond precision is truncated to microsecond precision.insert_sharding
to enforce partition locality during ingestion into partitioned tables.
Introduced tablet_min_size_bytes
and tablet_max_size_bytes
to control min/max tablet sizes during ingestion.
Added an ability to override public settings per query
You can now override settings by appending WITH (<setting_1_name> = <setting_1_value>, ...)
to queries.
This lets you apply settings directly to specific queries without affecting the entire session.
Documentation is available here.
READ_PARQUET
function.
The Parquet reader was rearchitected to provide more predictable memory usage when reading from external tables or using the READ_PARQUET
table-valued function. This change improves performance for many Parquet workloads. Users benefit from enhanced memory efficiency and faster query processing. More updates, like applying these changes to the COPY FROM
command, are planned for future releases.
This improvement is being rolled out gradually over multiple weeks.
UNION ALL
with overlapping aggregation or join keys followed by further aggregation or joining
Fixed a bug that could cause incorrect results on multi-node engines when doing a UNION ALL over subqueries that have overlapping but separate aggregation or join keys, and then later on aggregating by or joining on these keys.
ALTER DATABASE <existing name> RENAME TO <new name>
command
Users can now rename databases using the ALTER DATABASE <existing name> RENAME TO <new name>
command. This update provides greater flexibility in managing database names, allowing for easier organization and management.
Expanded the information_schema.engine_query_history
table with NODE_ORDINAL
, CLUSTER_ORDINAL
, and NUMBER_OF_CLUSTERS
columns to enhance query distribution insights and engine scaling behavior analysis
The information_schema.engine_query_history table now includes three new columns: NODE_ORDINAL
, CLUSTER_ORDINAL
, and NUMBER_OF_CLUSTERS
. NODE_ORDINAL
identifies the node that received the query, CLUSTER_ORDINAL
indicates the engine cluster that handled the query, and NUMBER_OF_CLUSTERS
shows the number of clusters active during the query.
These additions help users understand query distribution across clusters and provide insights into engine scaling behavior, such as automatic upscaling and downscaling.
Add preliminary support for STRUCT
data type
Added preliminary support for STRUCT
data type. This feature is in public preview. We are gathering feedback and further refining this feature. For details, please refer to the Firebolt SQL reference.
Added evicted_bytes
to information_schema.engine_metrics_history
Added a new metric that tracks the total amount of data (in bytes) that has been evicted from disk.
This metric helps users monitor disk space management and understand how much data is being evicted from the cache.
For details, please refer to the Information schema for engine metrics history.
REGEXP_LIKE_ANY
by up to 10x when the pattern list (<pattern>
) contains no regular expression metacharacters.
By intelligently detecting these cases, we now leverage Hyperscan, a high-performance regex engine, in combination with the Volnitsky text search algorithm to deliver substantial speedups in text matching workloads.
ARRAYS_OVERLAP
function to determine if two or more input arrays share common elements. This enhancement enables users to identify overlapping array elements, improving data analysis capabilities.
Added the NGRAM function to generate overlapping n-grams from textNGRAM
. The function NGRAM(n, text)
generates a sequence of overlapping n-grams from the given text. Each n-gram is a contiguous substring of n characters. It returns the result as an SQL array of text. This function is useful for text analysis and can help in search queries.
Example:
VALUES
lists to create an in-memory constant table with one or multiple rows for use in queries.
For example:
VALUES
can be used anywhere a SELECT
is allowed.
Added support for $$
to quote multiline string literals containing newlines and single quotes$$
to quote string literals that contain newlines and single quotes. This enhancement helps users create multiline strings without needing to escape special characters, making SQL scripts easier to read and maintain. For example:
COPY FROM
and READ_PARQUET()
functionsCOPY FROM
command and the READ_PARQUET()
function. This improvement accelerates data loading and processing for users.
information_schema.tables.ddl
output for tables using TO_...
family of functions in the partition expressioninformation_schema.tables.ddl
output for tables with a partition expression such as TO_YYYYMMDD
. This update ensures that the expression no longer incorrectly results in the TO_YEARMONTHDAY
function.
GROUPING SETS
, ROLLUP
and CUBE
clauses for GROUP BY
operations. These clauses enable more flexible and efficient multi-level aggregations in a single query, simplifying complex reporting and analytics workflows.
GROUPING SETS
: Specify multiple groupings in a single query.ROLLUP
: Create subtotals that roll up from the most detailed level to a grand total.CUBE
: Generate subtotals for all combinations of a set of columns.
These enhancements unlock more powerful data summarization directly within SQL.COVAR_POP
, COVAR_SAMP
and CORR
functionsARRAY_INTERSECT
, which finds common elements across given arrays. This functionality simplifies operations that require comparing multiple arrays to identify shared items, enhancing data analysis capabilities.
Added the ability to attach a certificate’s public key to a service account using ALTER SERVICE ACCOUNT.
Added the ability to control an engine’s Auto VACUUM behaviorCREATE ENGINE
and ALTER ENGINE
statements now support an AUTO_VACUUM
parameter.
Added the ability to revoke a privilege for a specific object, even if the privilege is inherited via an ANY-privilege from its parent object.SELECT ANY
privilege is granted on a schema, it is now possible to revoke the SELECT
privilege for a specific table within that schema.
COPY TO
function with SINGLE_FILE=true
COPY TO
function with SINGLE_FILE=true
. This change optimizes performance and resource efficiency, especially for users dealing with large datasets.
Enabled RESULT
and SUBRESULT
cache to reuse query results between user interface and JDBC connectorRESULT
and SUBRESULT
cache now reuses query results between those sent from the user interface and those sent from the JDBC connector. This improves performance by reducing redundant computations.
Rearchitected the Parquet reader for more predictable memory usage with external tables and the READ_PARQUET
functionREAD_PARQUET
function and improve performance for many parquet workloads. More improvements, including bringing these changes to COPY FROM
, will follow in upcoming releases.
Improved join pruning to support more join types and pruning through window operatorsPARTITION BY
clause. This enhancement improves query performance by reducing unnecessary data processing.
information_schema.object_privileges
has been updated so that all objects privileges granted to the user are explicitly enumerated.
So usage any database on account
privilege will be expanded to:
LOCATION
object in your Firebolt account. Use LOCATION
to store credentials and authenticate to external systems without needing to provide static credentials each time you run a query or create a table. LOCATION
works with (RBAC) so you can manage permissions securely. You can view detailed information about your locations including source type, URL, description, owner, and creation time in information_schema.locations.
Added creation timestamps for tables, views, indexes, and locationsinformation_schema
views for tables, views, indexes, and locations to help track objects for data management.
Added support for SQL pipe syntax|>
operator. This syntax allows for a linear, step-by-step flow of query transformations, improving readability and simplifying query composition. It supports all standard SQL operations and can be combined with traditional SQL syntax.
Added wildcard character functionality to READ_PARQUET
and READ_CSV
to simultaneously read multiple files*
or ?
to specify a file URL as a glob pattern in the READ_PARQUET and READ_CSV table-valued functions to read multiple files simultaneously. This enhancement simplifies managing large datasets by reducing the need to make multiple function calls.
Added functionality to transfer ownership of objects in the Firebolt WorkspaceCREATE VIEW
statements did not preserve the order of named function parametersIF
function to enhance query readability and simplify conditional expressionsIF
function simplifies query writing as a more concise alternative to the CASE WHEN
expression.IF(<cond_expr>, <then_expr>, <else_expr>)
as a shorter equivalent to
CASE WHEN <cond_expr> THEN <then_expr> ELSE <else_expr> END
.
Added INCREMENTAL
index optimization with VACUUM
VACUUM
statement now supports an INDEXES = INCREMENTAL
option, allowing incremental optimization of related indexes. This new mode uses fewer resources compared to a full reevaluation, improving index layouts. Although incremental optimization may not achieve the optimal layout of a full reevaluation, it maintains a balance between performance and resource usage.
Added MAX_CONCURRENCY
option to VACUUM
statementVACUUM
command now supports the MAX_CONCURRENCY
option, enabling you to limit concurrent processes during optimization. This allows for control of the number of concurrent processes in a VACUUM
operation, optimizing resource usage and improving performance in multi-threaded environments.
Added longitude wrapping for GEOGRAPHY
dataGEOGRAPHY
data from WKT, GeoJSON, WKB, or using the ST_GeogPoint
function. For example, POINT(180.5 1)
is now correctly interpreted as POINT(-179.5 1)
. This improvement simplifies geographic data handling.
Enhanced the EXPLAIN
function to support all SQL statements except for DDL and DCLEXPLAIN
feature now supports analysis of all SQL statements. However, it does not provide output details for DDL (Data Definition Language) and DCL (Data Control Language) statements.
COPY FROM
filtering performance$SOURCE_FILE_NAME
and $SOURCE_FILE_TIMESTAMP
, are now pushed down to the file listing during the COPY FROM
process when using multiple URL and pattern locations. This enhancement improves performance by reducing unnecessary data processing and speeds up data loading operations.
LineString
in WKTLineString
data were incorrectly accepted when parsing from WKT. For example, LINESTRING(0.5 1, 1 90.5)
now correctly returns an error instead of being interpreted as LINESTRING(0.5 1, -179 89.5)
. This fix enhances data integrity and prevents erroneous geographic entries.
MAX_CONCURRENCY
option to the VACUUM
statement for enhanced concurrency controlMAX_CONCURRENCY
option, allowing users to limit the number of concurrent streams. This improves control over resource usage during VACUUM
operations.
Introduced the INDEXES = ALL | NONE
for the VACUUM
statementINDEXES = ALL | NONE
option, giving users control over whether indexes are optimized during VACUUM
operations.
VACUUM
now runs automaticallyVACUUM
is required. This decision is based on factors such as the number of deleted rows and the need to consolidate storage for faster query performance and reduced disk space usage.
Added support for casting text literals to interval literals'1 month'::INTERVAL
, making it easier to define time intervals in queries.
Added default value support for GEOGRAPHY
columnsCREATE TABLE geo_table (geo_column GEOGRAPHY DEFAULT 'GEOMETRYCOLLECTION EMPTY')
ensures consistency across database entries when no explicit value is provided.
Added MIN_CLUSTERS
and MAX_CLUSTERS
columns to INFORMATION_SCHEMA.ENGINES
MIN_CLUSTERS
and MAX_CLUSTERS
columns, providing visibility into cluster configuration for improved database management.
Added support for STATEMENT_TIMEOUT
to manage query run time limitsSTATEMENT_TIMEOUT
. This feature specifies the number of milliseconds a statement is allowed to run. Any statement or query exceeding the specified time is canceled. A value of zero disables the timeout by default. Using STATEMENT_TIMEOUT
helps prevent excessively long-running queries, improving system efficiency and resource use.
Added the PostgreSQL function DATE(<arg>)
as an alternative to <arg>::DATE
DATE(<arg>)
function, offering an alternative to the <arg>::DATE
syntax for improved readability and usability in SQL queries.
Added support for FROM
first syntaxFROM
before SELECT
, allowing for more flexible query structures such as FROM t SELECT a, SUM(b) GROUP BY a
or even FROM t
without a SELECT
clause.
Support for AWS PrivateLink is now in public previewMIN_CLUSTERS
and MAX_CLUSTERS
parameters on CREATE ENGINE and ALTER ENGINE commands turns on concurrency auto-scaling: the engine will dynamically resize between the specified MIN_CLUSTERS
and MAX_CLUSTERS
values to match demand.
Firebolt introduces three fully managed editions
Firebolt now offers Standard, Enterprise, and Dedicated editions, each designed for different capabilities, security, and scalability needs.
GEOGRAPHY
columns at the tablet level to enhance query performanceVACUUM
. For additional details, see our blog post.
Added INDEX_GRANULARITY
storage parameter to optimize table storageCREATE TABLE
statement now supports the INDEX_GRANULARITY
storage parameter, allowing users to configure internal tablet range sizes for better performance based on query patterns.
EXPLAIN (STATISTICS)
to include estimated row counts and column distinct counts
The EXPLAIN (STATISTICS) function now provides estimated row counts and column distinct counts, when available. This enhancement offers more detailed insights for analyzing query performance.
Added a Tableau connector for the current version of Firebolt
Tableau is a visual analytics platform that empowers users to explore, analyze, and present data through interactive visualizations. The current Firebolt connector in Tableau Exchange supports only an older version of Firebolt. You can now download the latest connector directly from Firebolt and integrate it with Tableau Desktop or Tableau Server. Follow the installation instructions in Integrate with Tableau to set up the updated connector.
Added a DBeaver connector for the current version of Firebolt
DBeaver is a free, open-source database administration tool that supports multiple database types, provides a graphical interface for managing databases, running queries, and analyzing data. You can now connect to DBeaver using the Firebolt JDBC driver. Follow the instructions in Integrate with DBeaver to set up a connection to DBeaver.
Added the Firebolt Resource Center to the Firebolt Workspace
The Firebolt Resource Center is now accessible from the Firebolt Workspace. Select the Firebolt icon in the bottom-right corner to access resources including the Get started guide, Knowledge Center, Documentation, Release notes, Announcements, and a unified search tool covering all Firebolt resources.
LEFT JOIN
conversion for better query performance
A nested LEFT JOIN
can now be automatically replaced with a more efficient join when its results are not needed due to filtering in a later step. This optimization occurs when a LEFT JOIN
removes rows where the right-hand side contains NULL
values, effectively discarding the extra rows introduced by the earlier LEFT JOIN
.
In such cases, simplifying the join structure improves efficiency without altering query results. This conversion reduces unnecessary operations, lowering computational overhead and enhancing performance.
Improved performance by allowing multiple INSERT INTO <tbl> VALUES ...
statements to be combined in a single request
Workloads that send multiple consecutive INSERT INTO <tbl> VALUES ...
statements into the same table can now run much faster by sending all statements in a single request separated by semicolons. These statements are now automatically merged and processed together on the server within a single transaction, which means that either all of them succeed or fail. This improvement reduces network overhead and enhances performance for batch data insertion.
NULL
instead of empty strings for passing unset TVF parameters
Table-valued functions (TVFs) such as LIST_OBJECTS, READ_PARQUET, and READ_CSV that accept string named parameters like aws_access_key_id
and aws_role_arn
will no longer treat empty strings (''
) as unset arguments. The empty strings will instead be forwarded to the credential provider and may return errors. If you want to pass an explicitly unset parameter, use NULL
instead.
GROUP BY
and JOIN
planning
Resolved a bug in the optimization process for distributed GROUP BY
and JOIN
operators. This bug sometimes led to missed optimization opportunities and, in rare cases, incorrect results.
Fixed a bug in correlated EXISTS
subqueries that caused duplicated outer tuples in query results
Fixed a bug with non-trivial correlated EXISTS
subquery, which is a dependent subquery inside an EXISTS
condition that references a column from an outer query. An example of this kind of query follows:
EXISTS
condition only determines whether at least one match exists, without duplicating rows in the outer table. Now, each row in the outer table correctly appears once, with TRUE
if a match exists and FALSE
otherwise, improving the accuracy of query results.
E2E_DURATION_US
to include total query time in Firebolt infrastructure for enhanced performance monitoring and optimization
Added a new column E2E_DURATION_US
in the system tables INFORMATION_SCHEMA.ENGINE_RUNNING_QUERIES
, INFORMATION_SCHEMA.ENGINE_QUERY_HISTORY
, and INFORMATION_SCHEMA.ENGINE_USER_QUERY_HISTORY
which shows the total time a query has spent within the Firebolt infrastructure. In contrast, DURATION_US
measures only the time spent using the engine without considering retries or routing. The E2E_DURATION_US
metric measures the total time a query takes from initiation to final result delivery, and includes all sub-components of latency such as routing, preparation, queuing, compilation, retries, and runtimes. For example, if a query starts a stopped engine, the engine’s startup time is included in the query’s end-to-end duration. This update provides a more accurate representation of total query latency, for performance monitoring and optimization.
Unhid scanned_storage_bytes
and scanned_cache_bytes
from information schema views
Unhid scanned_storage_bytes
and scanned_cache_bytes
columns from information_schema.engine_query_history
and information_schema.engine_user_query_history
views. These columns were previously accessible when explicitly used in a SELECT
clause, but will now appear by default when you use SELECT *
.
GEOGRAPHY
objects of type POINT
Improved data loading performance for GEOGRAPHY
objects of type POINT
, enabling up to four times faster loading of geographical point data for more efficient data integration and analysis.
Improved file listing times for large external scans
In operations that read data from Amazon S3 buckets such as external table scans or COPY FROM
queries, Firebolt lists files in a URL to an Amazon S3 bucket. This process is constrained by the AWS API, which limits file listing to 1,000 files per request. Firebolt has increased the number of concurrent operations so that listing a large number of files is up to 3.5 times faster.
Added result cache support for cross and complex joins for improved performance
The query result cache now supports queries using cross joins or complex joins with OR
conditions and inequalities. This change reduces redundant calculations, improving query performance.
USAGE
permissions are now required to access INFORMATION_SCHEMA
views
Accessing INFORMATION_SCHEMA
views now requires USAGE
permissions on the database. Queries to INFORMATION_SCHEMA
will fail if these permissions are missing, ensuring consistent enforcement across permission-restricted queries. Ensure that your database has the necessary permissions to prevent access issues.
Improved EXPLAIN
command accuracy for default values of DATE
, TIMESTAMP
, and TIMESTAMPTZ
columns
The EXPLAIN
command now displays default values for columns of type DATE
, TIMESTAMP
, and TIMESTAMPTZ
columns. This update fixes a bug that previously caused default values to be shown incompletely, improving clarity and accuracy in query plan analysis.
Resolved filtering issue for views in information_schema.tables
to enforce user permissions
Fixed a bug in information_schema.tables
which previously listed views that users were not authorized to access. Even though querying these views would fail, users could still see that they existed. Now information_schema.tables
only lists views that users are allowed to access.
GRANT ALL ON ACCOUNT
and REVOKE ALL ON ACCOUNT
statements for role-based privilegesGRANT ALL ON ACCOUNT account_name TO role_name
and REVOKE ALL ON ACCOUNT account_name FROM role_name
are now supported. They grant or revoke all account-related privileges to the specified role role_name
.
Support for nested arrays in Parquet filesarray(array(array(string)))
.
information_schema
views@
character support restored in usernames@
is allowed in usernames again, which was previously restricted. The following statements are now valid and will not cause errors:
EXPLAIN VACUUM
and EXPLAIN
to improve error handling and result accuracyEXPLAIN VACUUM
has been updated:
EXPLAIN VACUUM
output no longer returns an empty result when the vacuumed object is an aggregating index.EXPLAIN
has been updated to show an error if the specified relation does not exist.IS NULL
in outer joinsIS NULL
predicates on non-nullable columns from the non-preserving side of an outer join were incorrectly reduced to FALSE
during common table expression (CTE) optimization. When the optimizer attempted to fuse multiple CTEs, it mistakenly replaced t2.x IS NULL
with FALSE
, altering query semantics and producing incorrect results. This occurred because t2.x
, though defined as non-nullable, became nullable when used in a left join. The fix ensures that IS NULL
predicates are correctly preserved during optimization.
ST_S2CELLIDFROMPOINT
to retrieve the S2 Cell ID of a GEOGRAPHY
Point
You can now use ST_S2CELLIDFROMPOINT to retrieve the S2 cell ID, which identifies the region on Earth that fully contains, or covers, a single Point GEOGRAPHY
object. You can also specify a cell resolution level.
Added keyboard shortcuts to the Firebolt Develop Space
The Firebolt Develop Space user interface added the following Windows/Mac keyboard shortcuts:
INFORMATION_SCHEMA.ROUTINES
view for built-in functions and operators
Added the INFORMATION_SCHEMA.ROUTINES view to return information about all of Firebolt’s built-in functions and operators including their database, schema, name, type, return data type, parameter data types, and whether they are deterministic.
Added support for the GEOGRAPHY
data type in external tables using CSV and JSON formats
Firebolt can now read columns of type GEOGRAPHY
from external tables in CSV or JSON format, which allows the querying of geospatial data including Points and Polygons.
Added a new MONITOR USAGE
privilege
You can use the MONITOR USAGE
privilege to view all queries running on an engine using information_schema.engine_query_history or information_schema.engine_running_queries views.
Introduced support for network policy ADD
/REMOVE
commandsALLOW
or BLOCK
lists without overriding existing values. This update simplifies network policy management when handling large IP lists and reduces the risk of concurrent updates overwriting each other.
ST_COVERS
, ST_CONTAINS
, and ST_INTERSECTS
functions
Optimized the ST_COVERS, ST_CONTAINS, and ST_INTERSECTS functions to improve performance when processing LineStrings and Points with non-intersecting inputs, and Polygons with inputs that do not intersect their boundaries.
Improved performance of the REGEXP_LIKE_ANY
function
The REGEXP_LIKE_ANY function now performs more efficiently when matching against multiple patterns by compiling a single combined RE2 regular expression object instead of evaluating each pattern separately.
@
character is no longer allowed in user names.old_name RENAME TO new_name
, the new_name
must now comply with the updated user name rules.APACHE_DATASKETCHES_HLL_ESTIMATE
failed for NULL
inputs
Resolved an error in the APACHE_DATASKETCHES_HLL_ESTIMATE function that occurred if any of its input values were NULL
. The function can now process NULL
inputs.
Resolved issue that allowed account lockout on last login
Fixed an issue where the ALTER USER SET LOGIN/SERVICE_ACCOUNT=...
statement could lock out the only active login in an account, rendering the account inaccessible. The operation now fails with an explicit error message in such cases.
Fixed incorrect ownership modification for information_schema
The statement ALTER SCHEMA information_schema SET OWNER owner_name;
previously succeeded, which was incorrect, because information_schema
cannot be modified. The operation now fails with an explicit error message.
Fixed an out-of-memory error during large CSV imports
Updated the ingestion pipeline for COPY FROM to ensure that large CSV files without a predefined schema can load into new tables without causing memory errors. This error did not affect external tables.
Prevent running queries when using a dropped database
When the current database does not exist, such as when it has been dropped, most queries fail as expected. We fixed a bug where some queries against specific information_schema
views, such as engines
, catalogs
, applicable_roles
, would still succeed in such cases. These queries now fail consistently, like all other queries against a non-existent database.
For example, running SELECT * FROM information_schema.engines
when the database is dropped previously worked, but now fails.
GEOGRAPHY
data type and functions for geospatial data handling [public preview]
Added a new GEOGRAPHY data type and functions for working with geospatial data. Firebolt supports the three industry standard formats Well-Known Text (WKT), Well-Known Binary (WKB), and GeoJSON for geospatial data.
This public preview release includes the following functions:
GEOGRAPHY
data type to the Well-Known Binary (WKB) format for geographic objects.GEOGRAPHY
data type to the extended Well-Known Binary (EWKB) format using Spatial Reference Identifier (SRID) 4326, which corresponds to the WGS84 coordinate system.GEOGRAPHY
data type to the GeoJSON format.GEOGRAPHY
data type to the Well-Known Text (WKT) format.GEOGRAPHY
object fully contains another.GEOGRAPHY
object fully encompasses another.GEOGRAPHY
objects, measured in meters.GEOGRAPHY
object from a GeoJSON string.GEOGRAPHY
object from a Well-Known Text (WKT) string.GEOGRAPHY
object from a Well-Known Binary (WKB) byte string.GEOGRAPHY
data type created from specified longitude and latitude coordinates.GEOGRAPHY
objects intersect each other.GEOGRAPHY
Point.GEOGRAPHY
Point.FIRST_VALUE
Added a new FIRST_VALUE window function that returns the first value evaluated in a specified window frame.
CREATE TABLE CLONE
to clone an existing table in a database
You can create a clone of an existing table in a database using CREATE TABLE CLONE
, which is extremely fast because it copies the table structure and references without duplicating the data. The clone functions independently of the original table. Any changes to the data or schema of either table will not affect the other.
Added 3-part identifier support for specifying databases in queries
You can now reference a database other than the current one in queries by using 3-part identifiers, which specify the database, schema, and object. For example, even if you previously selected a database db
by using USE DATABASE db
, you can still query a different database by using a query such as
SELECT * FROM other_db.public.t
. The limitation still exists that every query only addresses a single database.
Added ALTER TABLE ADD COLUMN
to add a column to an existing table
You can now use ALTER TABLE ADD COLUMN
to add columns to Firebolt-managed tables.
This functionality is temporarily limited to tables that were created on Firebolt version 4.10 or higher.
Added support of ALTER TABLE RENAME
command
You can use ALTER TABLE RENAME
to change the name of Firebolt-managed tables.
This functionality is temporarily limited to tables created on Firebolt version 4.10 or higher.
Added support for external file access using AWS session tokens
You can now use <AWS_SESSION_TOKEN>
with access keys to securely authenticate and access external files on AWS with the following features:
read_parquet
, read_csv
, and list_objects
.+
) preceding a float is now handled correctly. Example: '+3.4'
.'e'
or 'E'
are rejected. Example: 'E4'
.'4e+'
.429: Account system engine resources usage limit exceeded
. This rate limit targets accounts with exceptionally high resource consumption. Accounts with typical resource usage should not be affected and require no further action.
COUNT(*)
aggregate function before other aggregate functions. After this fix, such aggregating indexes can now be queried correctly without needing to be rebuilt.
Fixed a rare bug in subresult caching logic
Addressed a rare issue in the logic for caching and reusing subresults that could cause query failures with specific query patterns. This issue did not impact the correctness of query results.
Resolved issue preventing schema owners from granting “ANY” privileges
Fixed an issue where schema owners were unable to grant “ANY” privileges on their schema to other users.enable_result_cache
setting for controlling query result caching during benchmarking
You can set enable_result_cache
to FALSE
to disable the use of Firebolt’s result cache, which is set to TRUE
by default. Disabling result cashing can be useful for benchmarking query performance. When enable_result_cache
is disabled, resubmitting the same query will recompute the results rather than retrieving them from cache. For more information, see Result Cache.
Added LAG
and LEAD
support for negative offsets.
The second parameter in both LAG and LEAD can now accept negative numbers. Given a negative number, a LAG
will become a LEAD
and vice versa. For example, LAG(x,-5,3)
is the same as LEAD(x,5,3)
.
REGEXP_LIKE
Simple regular expressions in REGEXP_LIKE with case-insensitive matching, using the i
flag, now use the same optimized string search implementation as ILIKE, achieving up to three times faster runtimes in observed cases.
[]a]
, the expression is now correctly interpreted as a pattern that matches any single character from the list ]a
, rather than treating []
as an empty character class followed by a]
.
Trailing backslash in regular expressions
Fixed a rare case where invalid regular expressions with a trailing backslash \
were accepted.
BIT_SHIFT_RIGHT
and BIT_SHIFT_LEFT
The following bitwise shift functions are now supported:
BIT_SHIFT_RIGHT
shifts the bits of a number to the right by a specified number of positions, which effectively divides the number by 2
for each position shifted.BIT_SHIFT_LEFT
shifts the bits of a number to the left by a specified number of positions, which effectively multiples the number by 2
for each position shifted.ACOS
, ATAN
, ASIN
, COS
, COT
, TAN
, DEGREES
, and PI
The following trigonometric functions are now supported:
ACOS
calculates the arccosine of a value in radians.ATAN
calculates the arctangent of a value in radians.ASIN
calculates the arcsine of a value in radians.COS
calculates the cosine of a value in radians.COT
calculates the cotangent of a value in radians.TAN
calculates the tangent of a value in radians.DEGREES
converts a value in radians to degrees.PI
returns π as a value of type DOUBLE PRECISION
.timezone
query-level setting with time_zone
as an alias
Added the timezone
query-level setting. The previous time_zone
query setting still works, and is now an alias for timezone
.
Introduced new PERCENTILE_CONT
and MEDIAN
aggregate functions
Added the following aggregate functions:
PERCENTILE_CONT
calculates a specified percentile of values in an ordered dataset.MEDIAN
returns the median of a given column. It is equivalent to PERCENTILE_CONT(0.5)
: half the values in the column are smaller, and half are bigger than the returned value. If the number of values in the column is even, MEDIAN
returns the arithmetic mean of the two middle values.COPY TO
with TYPE=PARQUET
. Specify COMPRESSION=SNAPPY
within COPY TO
to enable this.
Added information_schema.engine_user_query_history
view to log only user-initiated queriesinformation_schema.engine_user_query_history
, which shows all queries initiated by users. This view filters information from information_schema.engine_query_history
view, which logs all engine queries including system-generated ones like UI updates and page-load requests.
Added support for information_schema.enabled_roles
information_schema.enabled_roles
which lists the roles available in the account.
Added a system setting enable_subresult_cache
for controlling subresult reuseenable_subresult_cache
allows users to enable or disable caching of query subresults for subsequent reuse.
Caching remains enabled by default. This setting allows users to temporarily disabling caching, e.g. for benchmarking purposes.
Added “FROM first” syntax allowing the FROM
clause to precede the SELECT
clauseFROM
clause before the SELECT
clause, for example FROM t SELECT a, SUM(b) GROUP BY a
. You can now also omit the SELECT
clause, as in FROM t
.
Introduced a new function GEN_RANDOM_UUID_TEXT
to generate a universally unique identifier (UUID)GEN_RANDOM_UUID_TEXT
accepts no arguments and returns a version 4
UUID as defined by RFC-4122 as a TEXT
value.
Introduced ~
and !~
operators as aliases for REGEXP_LIKE
and NOT REGEXP_LIKE
~
operator as an alias for REGEXP_LIKE
, and the !~
operator, which serves as an alias for NOT REGEXP_LIKE
.
Introduced JSON functions JSON_POINTER_EXTRACT_KEYS
, JSON_POINTER_EXTRACT_VALUES
, JSON_POINTER_EXTRACT_TEXT
JSON_POINTER_EXTRACT_KEYS
extracts keys from a JSON objectJSON_POINTER_EXTRACT_VALUES
extracts values from a JSON objectJSON_POINTER_EXTRACT_TEXT
extracts the JSON string value as SQL TEXTRADIANS
, SIN
, ATAN2
RADIANS
to convert degrees into radiansSIN
to compute the sine in radiansATAN2
to calculate the arctangent with two arguments. ATAN2(y,x)
is the angle between the positive x-axis and the line from the origin to the point (x,y)
, expressed in radians.REAL
and DOUBLE
inputs and return standard deviations and variances:
STDDEV_SAMP
- Returns the sample standard deviation of all non-NULL
numeric values produced by an expression, which measures how spread out values are in a sample.STDDEV_POP
- Returns the population standard deviation of all non-NULL
numeric values produced by an expression, which measures how spread out values are in an entire population.VAR_SAMP
- Returns the sample variance of all non-NULL
numeric values produced by an expression, which measures the average of the squared differences from the sample mean, indicating how spread out the values are within a sample.VAR_POP
- Returns the population variance of all non-NULL
numeric values produced by an expression. The population variance measures the average of the squared differences from the population mean, indicating how spread out the values are within the entire population.ARRAY_ALL_MATCH
and ARRAY_ANY_MATCH
ARRAY_ALL_MATCH
and ARRAY_ANY_MATCH
accept an (optional) lambda function and an array and return TRUE
if all elements (ARRAY_ALL_MATCH
) or any element (ARRAY_ANY_MATCH
) satisfy the lambda condition, and FALSE
otherwise. When no lambda is passed, the array has to be of type BOOLEAN
, and the identity lambda x -> x
is used.
JSON_EXTRACT
, JSON_EXTRACT_ARRAY
, and JSON_VALUE
functionsJSON_EXTRACT
, JSON_EXTRACT_ARRAY
, and JSON_VALUE
functions.
NULL
values to align with PostgreSQL behavior
The sorting method for array columns containing NULL
values has been updated to ensure that ASC NULLS FIRST
places NULL
values before arrays, and DESC NULLS LAST
places NULL
values after arrays, which aligns with PostgreSQL behavior.
The following code example creates a temporary table tbl
which contains three rows: a NULL
array, an array with the value 1
, and an array with a NULL
element. Then, a SELECT
statement sorts all rows in ascending order:
{NULL}, {1}, NULL
, but now returns NULL, {1}, {NULL}
.
NULLS FIRST
and NULLS LAST
apply to the array itself, not to its elements. By default, ascending order (ASC
) assumes NULLS LAST
, while descending order (DESC
) assumes NULLS FIRST
when sorting arrays.
Allowed use of the SESSION_USER function without parentheses
The SESSION_USER
function can now be used without parentheses, like this: SELECT SESSION_USER
. As a result, any column named session_user
now needs to be enclosed in double quotes as follows: SELECT 1 AS "session_user"
or SELECT "session_user" FROM table
.
nan
-nan
. This was corrected to consistently display NaN values as nan
in the JSON output.
Resolved an issue with CHECKSUM
and HASH_AGG
failing when combining literals and table columnsCHECKSUM
and HASH_AGG
functions failed when used with a combination of literals and table columns.
Fixed a rare inaccuracy that could cause incorrect results on multi-node engines when performing certain UNION ALL
operationsUNION ALL
operations on subqueries that are the result of aggregations or joins on overlapping but distinct keys, followed by an aggregation or join on the common keys of the subqueries’ aggregations or joins.
Fixed a rare inaccuracy that could cause incorrect results with CTEs using RANDOM()
in specific join scenariosRANDOM()
function was used multiple times, and at least one of these uses was on the probe side of a join involving a primary index key of the underlying table.
COPY TO
support for the SNAPPY
compression typeSNAPPY
as a new compression option for Parquet files. This enhancement offers greater flexibility for managing file size and performance, particularly for workloads requiring faster compression. Each file is written in Parquet format, with the specified compression applied to the data pages in the column chunks.
COPY FROM
support for filtering by source file metadataWHERE
clause.
Added support for vector distance calculations with new functions
Firebolt has added support for vector distance and similarity calculations with the following new functions: VECTOR_COSINE_DISTANCE, VECTOR_MANHATTAN_DISTANCE, VECTOR_EUCLIDEAN_DISTANCE, VECTOR_SQUARED_EUCLIDEAN_DISTANCE, VECTOR_COSINE_SIMILARITY, and VECTOR_INNER_PRODUCT.
SHOW CATALOGS
statement and aliased SHOW DATABASES
to it while deprecating SHOW DATABASE X
A new statement SHOW CATALOGS
now acts as an alias for SHOW DATABASES
. The statement SHOW DATABASE X
is no longer supported.
COPY FROM
now unzips Parquet files with gzip extensions
Before version 4.6, the COPY FROM
command did not apply file-level decompression to Parquet files with a .gzip
or .gz
extension. The command treated these files as standard Parquet files, assuming that any compression existed only within the internal Parquet format structure.
With the release of version 4.6, COPY FROM
now processes Parquet files similarly to other formats. When a Parquet file has a .gz
or .gzip
extension, the command will first decompress the file before reading it as a Parquet format file. Hence, it will now fail while reading internally compressed Parquet files with gzip extensions. Users experiencing issues with loading files after this change should contact the support team at support@firebolt.io for assistance.
aggregatefunction2
has been updated to aggregatefunction
in the EXPLAIN output.
Fixed incorrect results in ARRAY_AGG
expressions by excluding NULL
values for false conditions in aggregating indexes
Aggregate expressions like ARRAY_AGG(CASE WHEN <cond> THEN <column> ELSE NULL END)
previously returned incorrect results by excluding NULL
values for rows when the condition was FALSE
.
TEXT
to DATE
with truncation of timestamp-related fields
Casting from TEXT
to DATE
now supports text values containing fields related to timestamps. These fields are accepted, but truncated during conversion to DATE
.
The following code example casts the TEXT
representation of the timestamp 2024-08-07 12:34:56.789
to the DATE
data type. The conversion truncates the time portion, leaving only the date, as follows:
Example:
CONVERT_FROM
function
Added the CONVERT_FROM
function that converts a BYTEA
value with a given encoding to a TEXT
value encoded in UTF-8.
Added the BITWISE aggregate functions
Added support for the following functions: BIT_OR (bitwise OR), BIT_XOR (bitwise exclusive OR), and BIT_AND (bitwise AND).
Added the REGEXP_LIKE_ANY
function
Added the REGEXP_LIKE_ANY
function that checks whether a given string matches any regular expression pattern from a specified list of patterns.
created
and last_altered
column types in information_schema.views
from TIMESTAMP
to TIMESTAMPTZ
The data types of the created
and last_altered
columns in information_schema.views
have been changed from TIMESTAMP
to TIMESTAMPTZ
.
Fixed runtime constant handling in the sort operator
Fixed the handling of runtime constants in the sort operator.
Now, the sort operator can be correctly combined with GENERATE_SERIES
.
For example, the query SELECT x, GENERATE_SERIES(1,7,3) FROM GENERATE_SERIES(1,3) t(x)
now correctly displays values 1
to 3
in the first column, instead of just 1
.
DATE '2023-03-03' - DATE '1996-09-03'
produces 9677
.
Role-based permissions for COPY FROM and external tables
Added support for role-based permissions (ARNs) to the COPY FROM command and external table operations.
Added trust_policy_role
column to information_schema.accounts
view for S3 access
Added a new column trust_policy_role
to the information_schema.accounts
view. This column shows the role used by Firebolt to access customer S3 buckets.
Enabled selection of external tables’ pseudo columns without adding data columns
Users can now select an external table’s pseudo columns (source file name, timestamp, size, and etag) without adding any data columns. For example, select $source_file_timestamp from t_external
returns the file timestamps for each row. The query select count($source_file_timestamp) from t_external
returns the total number of rows in the external table, similar to count(*)
. The query select count(distinct $source_file_name) from t_external
returns the number of distinct objects containing at least one row in the source S3 location.
Regarding count(*)
performance, formats like CSV or JSON still require reading the data fully to determine an external file’s row count. However, Parquet files provide the row count as part of the file header, and this is now used instead of reading the full data.
Extended support for arbitrary join conditions, including multiple inequality predicates
We now support more join conditions. As long as there is one equality predicate comparing a left column to a right column of the join (not part of an OR expression), the remaining join condition can now be an arbitrary expression. The limitation on the number of inequality predicates was removed.
New functions URL_ENCODE
and URL_DECODE
We added support for the URL_ENCODE
and URL_DECODE
functions.
New logarithm functions ln
, log
We added support for calculating logarithms. The natural logarithm is available using ln(val double precision)
. The base 10 logarithm is available using log(val double precision)
. Logarithms with custom bases are available using log(base double precision, val double precision)
.
New function `SQRT
Added support for the SQRT
function to compute the square root.
New functions JSON_VALUE
, JSON_VALUE_ARRAY
, JSON_EXTRACT_ARRAY
Added support for the functions JSON_VALUE
, JSON_VALUE_ARRAY
, and JSON_EXTRACT_ARRAY
.
New function SESSION_USER
Support has been added for the SESSION_USER
function, which retrieves the current user name.
New columns in information_schema.engine_query_history
Added two new columns to information_schema.engine_query_history
: query_text_normalized_hash
and query_text_normalized
.
create table geography(geography int);
will now fail, but create table "geography" ("geography" int);
will succeed.
Deprecated the legacy HTTP ClickHouse headers
We no longer accept or return the legacy HTTP ClickHouse header format X-ClickHouse-*
.
Fixed json_value
zero-byte handling
The json_value
function no longer returns null characters (0x00), as the TEXT datatype does not support them. For example, select json_value('"\u0000"');
now results in an error.
Change default values for NODES and TYPE during CREATE ENGINE
When performing a CREATE ENGINE, the default values for NODES and TYPE parameters have changed. NODES defaults to 2
(previously 1
) and TYPE defaults to M
(previously S
). To create an engine with the previous default values, run the following command:
COPY TO
when SINGLE_FILE=FALSE
. Previously, the specified directory structure in the location was repeated twice in the S3 path. For example, files were output to “s3://my-bucket/out/path/out/path/” instead of “s3://my-bucket/out/path/”.
Fixed the file extension in the S3 path when using the COPY TO statement with GZIP-Parquet format
Fixed an issue in COPY TO
when TYPE=PARQUET
and COMPRESSION=GZIP
, which uses the Parquet file format with internal GZIP compression for the columns. Previously, the output files would have the extension “.parquet.gz”. Now, the extension is “.gz.parquet”.
date_time + INTERVAL * d
, where date_time
is a expression of type Date, Timestamp, TimestampTz, and d
is an expression of type DOUBLE PRECISION. The interval is now scaled by d
before being added to date_time
. For example, writing INTERVAL '1 day' * 3
is equivalent to writing INTERVAL '3 days'
.
Optimized selective inner and right joins on primary index and partition by columns to reduce rows scanned
Selective inner and right joins on primary index and partition by columns now can now benefit from pruning. This reduces the number of rows scanned by filtering out rows that are not part of the join result early in the process. This optimization works best when joining on the first primary index column or a partition by column. The optimization is applied automatically when applicable, and no action is required. Queries that used this optimization will display “Prune:” labels on the table scan in the EXPLAIN (PHYSICAL) or EXPLAIN (ANALYZE) output.
index_of
function
Resolved an issue where the index_of
function would fail when applied to the result of a cross join that produced a single row.
CURRENT_DATE()
, LOCALTIMESTAMP()
, CURRENT_TIMESTAMP()
, NOW()
. Existing tables with column DEFAULT expressions are not affected.
Underflow detection while casting from TEXT to floating point data types
Firebolt now detects underflow, a condition where a numeric value becomes smaller than the minimum limit that a data type can represent, when casting from TEXT to floating point data types. For example, the query select '10e-70'::float4;
now returns an error, while it previously returned 0.0
.
Returning query execution errors in JSON format through the HTTP API
Firebolt’s HTTP API now returns query execution errors in JSON format, allowing for future enhancements like including metadata such as error codes, or the location of a failing expression within the SQL script.
Changed default of case_sensitive_column_mapping parameter in COPY FROM
The default value for the CASE_SENSITIVE_COLUMN_MAPPING
parameter in COPY FROM
is now FALSE
, meaning that if a target table contains column names in uppercase and the source file to ingest has the same columns in lowercase, the ingestion will consider them the same column and ingest the data.
extract
function returns Numeric(38,9) for Epoch, second, and millisecond extraction
The result data type of the extract
function for epoch, second, and millisecond was changed to return the type Numeric(38,9) instead of a narrower Numeric type. For example, select extract(second from '2024-04-22 07:10:20'::timestamp);
now returns Numeric(38,9) instead of Numeric(8,6).
ntile
window function
Firebolt now supports the ntile
window function. Refer to our NTILE documentation for examples and usage.
SELECT [project_list] FROM [table] LIMIT [limit]
” on large tables are now significantly faster.
Updated table level RBAC
Table level RBAC is now supported by Firebolt. This means that RBAC checks also cover schemas, tables, views and aggregating indexes. Refer to our RBAC docs for a detailed overview of this new feature. The new Firebolt version inhibits the following change:
account_admin
, system_admin
and public
roles. The effect is transparent— any user assigned with those roles will not be affected.INFORMATION_SCHEMA.ENGINES
We removed the following columns from INFORMATION_SCHEMA.ENGINES
that were only for FB 1.0 compatibility: region
, spec
, scale
, warmup
, and attached_to
. These columns were always empty. (These columns are hidden and do not appear in SELECT *
queries, but they will still work if referenced explicitly.)
json_extract_raw
, json_extract_array_raw
, json_extract_values
, and json_extract_keys
. Updated json_extract
function: the third argument is now path_syntax
, which is a JSON pointer expression. See JSON_EXTRACT for examples and usage.
Cluster ordinal update
Replaced engine_cluster
with cluster_ordinal in information_schema.engine_metrics_history
. The new column is an integer representing the cluster number.
Configurable cancellation behavior on connection drop
Introduced the cancel_query_on_connection_drop
setting, allowing clients to control query cancellation on HTTP connection drop. Options include NONE
, ALL
, and TYPE_DEPENDENT
. Refer to system settings for examples and usage.
JSON format as default for error output
The HTTP API now returns query execution errors in JSON format by default. This change allows for the inclusion of meta information such as error codes and the location of failing expressions in SQL scripts.
STOP ENGINE will drain currently running queries first
STOP ENGINE
command now supports graceful drain, meaning any currently running queries will be run to completion. Once all the queries are completed, the engine will be fully stopped and terminated. If you want to stop the engine immediately, you can issue a STOP ENGINE command use the TERMINATE option. For example, to immediately stop an engine, my_engine, you can use:
ALTER ENGINE
command now supports graceful drain, meaning when you scale an engine (vertically or horizontally), any currently running queries will not be terminated. New queries after the scaling operation will be directed to a new cluster, while queries running on the old cluster will be run to completion.
Updated RBAC ownership management
We have introduced several updates to role and privilege management:
security_admin
role will be removed temporarily and re-introduced in a later release.Information_object_privileges
includes more privileges. Switching to to a specific user database (e.g by executing use database db
) will only show privileges relevant for that database. Account-level privileges no longer show up when attached to a specific database.public
role. This grant can be revoked.WHERE column IN (...)
filters on external table scans.information_schema.engine_running_queries
or cancel them manually using the cancel query
statement if desired. DQL queries (SELECT) are still canceled automatically on connection drop.
New Aggregate Functions: CHECKSUM
and hash_agg
CHECKSUM
and hash_agg
functions are now supported for aggregating indexes. Note that when the hash_agg
function doesn’t receive rows, the result is 0.
MATCH
function
The match
function has been removed and replaced with regexp_like.
Producing an error for array function failure instead of NULL
Array function queries that accept two or more array arguments now produce an error. If you call an array function such as array_transform(..)
or array_sort(..)
with multiple array arguments, the arrays must have the same size.
For example:
array_length(arr1) != array_length(arr2)
. We now also perform this check for NULL literals. If you previously used array_transform(x, y -> x + y, NULL::INT[], Array[5, 6])
, you got back NULL
. Now, the query using that expression will raise an error.
Added ARRAY_FIRST function
The array_first function has been added. It returns the first element in the given array for which the given function returns true
.
New name for any_match
A new name for any_match
has been added: array_any_match. any_match
will be kept as an alias.
Updated ARRAY_SUM return types
The array_sum
function of bigint[]
now returns a numeric value and array_sum
of real[]
returns a real value.
Precedence of operators
Breaking change in operator precedence between comparison operators such as =
, <
, >
, and IS
operator. New behavior is compatible with Postgres.
Examples of query that changed behavior:
true
, because it was interpreted as select (1 is null) = (2 is null)
. It now becomes an error of incompatible types in =
false
- select false = (false is not null)
, but now is true
- select (false = false) is not null
.
Dropping the role
Role cannot be dropped if there are permissions granted to the role. The error message will be displayed if you need to manually drop permissions associated to the role.
Coalesce Short-Circuiting
COALESCE
now supports short-circuiting in Firebolt. Queries such as COALESCE(a, 1 / 0) FROM t
could fail before, even when there were no NULLs in t. Only CASE WHEN
supported short circuiting. Firebolt is now aligned with PostgreSQL and supports short circuiting in COALESCE
as well.
Create table under I_S schema
You can now execute CREATE TABLE
/VIEW
/AGGREGATING INDEX
only under the public schema.
Improved error message for JSON PARSE_AS_TEXT
format
The error message for external tables created with JSON PARSE_AS_TEXT
format has been revised. This format reads specifically into a single column of type either TEXT or TEXT NOT NULL
. (Note there may be external table partition columns defined after the single TEXT column, and they are okay). Now, only the error message regarding the CREATE EXTERNAL TABLE
statement on a user’s first attempt to use SELECT
will be seen. Support for reading format JSON PARSE_AS_TEXT=TRUE
into a TEXT NOT NULL
column has been added.
Implemented column_mismatch
Support for ALLOW_COLUMN_MISMATCH
in COPY INTO
has been added.
Corrected NULL behavior of STRING_TO_ARRAY
The behavior of string_to_array
now matches its behavior in PostgreSQL. The change affects NULL delimiters where the string is split into individual characters, as well as empty strings and where the output is now an empty array.
Changed city_hash behavior for nullable inputs
The behavior for city_hash
has changed for nullable inputs.
For example:
ARRAY_AGG
now preserves NULLS
The array_agg
function has been changed to return PostgreSQL-compliant results:
array_agg
now preserves NULL
values in its input, e.g. select array_agg(x) from unnest(array [1,NULL,2] x)
returns {1,NULL,2}
array_agg
now returns NULL
instead of an empty array if there are no input valuesarray_sum
Array aggregate functions no longer support lambda parameters. To get the old behavior for conditional lambda functions, use transform instead.
For example:
DATE
/TIMESTAMP
column if the EXTERNAL TABLE
expects the column to have type INT
/BIGINT
. DATE
/TIMESTAMP
cannot be cast to INT
/BIGINT
, and external table scans will no longer allow this cast either. You need to explicitly transform the Parquet/ORC DATE
/TIMESTAMP
column with EXTRACT
(EPOCH FROM
col) to insert it into an INT
/BIGINT
column.
element_at
function for arrays has been removed and replaced with the []
operator.
Change of return type from BIGINT to INTEGER
The index_of
/array_position
function now returns INTEGER instead of BIGINT.
Removed LIMIT DISTINCT syntax
The LIMIT_DISTINCT
syntax is no longer supported by Firebolt.
Updated CAST function behavior
All cast logic has been moved to runtime in Firebolt. The castColumn
function is now replaced by fbCastColumn
, ensuring consistent casting behavior and resolving issues with the COPY FROM
operation and other cast calls. Uses of implicit/explicit CAST
may result in errors due to this fix.
array_position
where searching for NULL
in an array with non-null elements incorrectly returned a match in some cases.typeof
function has been added, which returns the data type of a SQL expression as a string.
spilled_bytes
column in information_schema.query_history
. Spilling does not support aggregations where a single group exceeds the available memory (e.g., select count(distinct high_cardinality_column) from huge_table
) and may not yet work reliably for all aggregate functions or engine specs. We will continue improving the feature in upcoming releases.
No overflow detected in cast from FLOAT to DECIMAL
Fix results of casting from float32
to decimals with precision > 18.
In addition to the correct results breaking change, there are certain queries that was working before that now will fail involving overflow.
Example query:
SELECT
17014118346046923173168730371588410572::REAL::DECIMAL(37,0).ARRAY_COUNT
on NULL
array now returns 0
instead of NULL
.
No overflow check in arithmetic operations
Arithmetic operators (i.e. multiplication, addition, subtraction, and division) now perform correct overflow checking. This means that queries that used to return wrong results in the past now throw runtime errors.
Example queries:
SELECT
4294967296 * 4294967296 -> now throws an error, before it would return 0
SELECT
9223372036854775807 + 9223372036854775807 -> now throws an error, before it would return -2
SELECT
(a + b) * c -> this might throw runtime errors if there are large values in the column, but this is highly data dependent.
SELECT E'\U0001F525b\x6F\154t';
returns 🔥bolt
. If the setting standard_conforming_strings
is not enabled for you, regular string literals (e.g., SELECT 'foo';
) will also recognize the new escape sequences. However, we recommend exclusively using escape string literals for using escape sequences. Please be aware that you will get different results if you previously used (escape) string literals containing the syntax we now use for Unicode and octal escape sequences.
Change return value of length and octet_length to INT
Length and array_length now return INTEGER instead of BIGINT.
Subqueries in the GROUP BY/HAVING/ORDER BY clauses change
Subqueries in GROUP BY/HAVING/ORDER BY
can no longer references columns from the selection list of the outer query via their aliases as per PG compliance. select 1 + 1 as a order by (select a);
used to work, but now fails with unresolved name a
error.
Bytea serialization to CSV fix
Changed Bytea to CSV export: from escaped to non escaped.
Example:
COPY
(select ‘a’::bytea) to ‘s3…’; the results will now be “\x61” instead of “\x61”.SELECT
5000000000000000000000000000000000000.0::DECIMAL(38,1); -> 5000000000000000000000000000000000000.0SELECT
(5000000000000000000000000000000000000.0::DECIMAL(38,1)+5000000000000000000000000000000000000.0::DECIMAL(38 1)); -> ERROR: overflow.USAGE ANY ENGINE
(and similar) privileges were shown for * account. Now it is being show for current account.
btrim
, ltrim
, rtrim
, or trim
with a literal string but non-literal trim characters could result in an error.
EXPLAIN
queries, we now allow only one of the following options at the same time: ALL
, LOGICAL
, PHYSICAL
, ANALYZE
.EXPLAIN (ALL)
now returns the plans in multiple rows instead of multiple columns.
Disabled Unix Time Functions
The following functions are not supported anymore:
‘from_unixtime’
‘to_unix_timestamp’
‘to_unix_time’
Renamed spilled metrics columns
The columns spilled_bytes_uncompressed
and spilled_bytes_compressed
of information_schema.query_history
have been replaced by a single column spilled_bytes
. It contains the amount of data that was spilled to disk temporarily while executing the query.
Aggregating index placement
Aggregating index is now placed in the same namespace as tables and views.
Syntax and planner support for LATERAL scoping
LATERAL is now a reserved keyword. It must now be used within double-quotes when using it as an object identifier.
-0.0
and +0.0
, as well as -nan
and +nan
were not considered equal in distributed queries.
TRY_CAST from TEXT to NUMERIC now works as expected: if the value cannot be parsed as NUMERIC it produces null.
IS DISTINCT FROM
and IS NOT DISTINCT FROM
have been added.
information_schema.object_privileges
view.
ARRAY_FIRST
and ARRAY_FIRST_INDEX
returned an error if the given input was nullable.
EXPLAIN (ANALYZE) <select statement>
and get detailed metrics about how much time is spent on each operator in the query plan, and how much data is processed. The query plan shown there is the physical query plan, which you can inspect using EXPLAIN (PHYSICAL) <select statement>
without executing the query. It shows how query processing is distributed over the nodes of an engine.
source_file_timestamp
has been migrated from the data type TIMESTAMP
(legacy timestamp type without time zone) to the type TIMESTAMPTZ
(new timestamp type with time zone).
Despite the increased resolution, the data is still in second precision as AWS S3 provides them only as unix seconds.
Use source_file_timestamp - NOW()
instead of DATE_DIFF('second', source_file_timestamp, NOW())
New function added
A new alias ARRAY_TO_STRING has been added to function ARRAY_JOIN
.
IN
expressions with scalar arguments now return Postgres-compliant results if there are NULL
s in the IN
list.