System settings
Lists Firebolt system settings that you can configure using SQL.
You can use a SET
statement in a SQL script to configure aspects of Firebolt’s system behavior. Each statement is a query in its own right and must be terminated with a semi-colon (;). The SET
statement cannot be included in other queries. This topic provides a list of available settings by function.
Setting via WITH
You can override settings by appending WITH (<setting_1_name> = <setting_1_value>, ...)
to the query. This lets you apply settings directly to specific queries without affecting the entire session.
Example
Instead of:
You can write:
Supported Commands
The WITH
clause is supported for the following commands:
Supported Settings
The following settings can be configured using the WITH
clause:
- timezone
- standard_conforming_strings
- max_result_rows
- statement_timeout
- cancel_query_on_connection_drop
- query_label
- enable_result_cache
- enable_subresult_cache
- insert_sharding
- tablet_min_size_bytes and tablet_max_size_bytes
Setting the time zone
Use this setting to specify the session time zone. Time zone names are from the Time Zone Database. You can see the list of tz database time zones here. For times in the future, the latest known rule for the given time zone is applied. Firebolt does not support time zone abbreviations, as they cannot account for daylight savings time transitions, and some time zone abbreviations have meant different UTC offsets at different times. The default value of the timezone
setting is UTC.
Syntax
Example
The following code example demonstrates how setting the timezone parameter affects the interpretation and conversion of TIMESTAMPTZ
values:
Enable parsing for literal strings
If set to true
, strings are parsed without escaping, treating backslashes literally. By default, this setting is enabled.
Syntax
Example
The following code example demonstrates how setting standard_conforming_strings
affects the interpretation of escape sequences in string literals:
Statement timeout
Specifies the number of milliseconds a SQL statement is allowed to run. Any SQL statement or query exceeding the specified time is canceled. A value of zero disables the timeout by default.
Syntax
Example
The following SQL example sets the query timeout to three seconds:
Limit the number of result rows
When set to a value greater than zero, this setting limits the number of rows returned by SELECT
statements. The query is executed as if an additional LIMIT
clause is added to the SQL query. A value of zero or less means that no limit is applied. By default, no limit to the number of result rows is applied.
Syntax
Example
The following queries all return the same result. For the first query, no explicit settings are set:
Query cancellation mode on connection drop
Specify how the query should behave when the HTTP connection to Firebolt is dropped, such as when the UI window is closed. For this, you can choose between 3 different modes:
NONE
: The query will not be canceled on connection dropALL
: The query will be canceled on connection dropTYPE_DEPENDENT
: Only queries without side effects will be canceled, such asSELECT
.
The default is TYPE_DEPENDENT
.
Syntax
Example
The following code example demonstrates how to control query cancellation behavior when a connection drops using none
, all
, and type_dependent
modes for SET cancel_query_on_connection_drop
:
Query labeling/tagging
Use this option to label your query with a custom text. This simplifies query cancellation and retrieving the query status from system tables.
Syntax
Example
The following code example assigns a query label to a query using SET query_label
, allowing you to track it in information_schema
, engine_running_queries
, and information_schema.engine_query_history
. It then demonstrates how to retrieve the QUERY_ID
for the labeled query and cancel it using CANCEL QUERY
:
Multi-cluster engine warmup
Use this option to distribute queries across all clusters of an engine, simplifying the process of initializing cached data to a consistent state across all clusters after a START ENGINE
or ALTER ENGINE
operation.
Warmup queries complete after they have run on all clusters of the engine. The queries return an empty result if they succeed on all clusters. If the query fails on any cluster, it returns an error. If multiple errors occur, only one error is returned.
Syntax
Example
The following code example activates the warmup mode so that the query runs on production_table
using all clusters of an engine, and returns an empty result upon success:
Result cache
Set enable_result_cache
to FALSE
to disable the use of Firebolt’s result cache, which is set to TRUE
by default. Disabling result cashing can be useful for benchmarking query performance. When enable_result_cache
is disabled, resubmitting the same query will recompute the results rather than retrieving them from cache.
Syntax
Example
The following code example disables the result cache so that no previously cached results are used, and no new cache entries are written:
Subresult cache
Firebolt implements advanced cross-query optimization that allows SQL queries to reuse intermediate query execution states from previous requests. Subresult caching operates at a semantic level, which allows Firebolt to understand and optimize queries based on the meaning and context of the data rather than solely based on their syntax or structure. This capability allows Firebolt to optimize across different query patterns for improved efficiency.
Set enable_subresult_cache
to FALSE
to disable Firebolt’s subresult caching, which is set to TRUE
by default.
Disabling subresult caching is generally not recommended, as it can negatively impact query performance, especially for complex workloads. For most benchmarking scenarios, disable the result cache instead, as described in the previous Result cache section. This approach affects only the final result caching while preserving the benefits of subresult optimizations.
Syntax
Example
The following code example disables the subresult cache so no previously cached subresult is used and no new cache entries are written by this query:
Setting enable_subresult_cache
to FALSE
disables the use of all cached subresults. In particular, it deactivates two caching mechanisms that normally speed up query runtimes: the use of the MaybeCache
operator, which includes the full result cache, and the hash-table cache used by the Join
operator.
Insert sharding
When working with partitioned tables, Firebolt enforces separation of data between tablets: rows of different partitions cannot be stored together in the same tablet.
Consider a scenario where you’re ingesting historical data for the last 3 years with date-based partitioning: this could result in around 1000 tablets. For large datasets, a common practice is to scale out for ingestion. However, this creates a challenge: each date might be processed from multiple nodes: for example, for 10 nodes it can result in up to 10,000 tablets, instead of 1,000. This not only slows down data persistence due to increased storage requests but can also degrade query performance.
To address this, Firebolt provides controls for partitioned tables ingestion:
-
insert_sharding='shard_on_read'
: Use when the partition expression is based on$source_file_name
. This allows Firebolt to determine the target partition before reading data and group files of the same partition on the same nodes. This is most effective when your source files are already organized by partition (e.g., files named likedata_20240101.csv
,data_20240102.csv
). -
insert_sharding='shuffle_on_write'
: Use when the partition expression is based on the data itself. In this case, data must be read first to determine partitioning. Just before insertion and after any transformations, the data is re-shuffled for partitions locality. Use this when your partition values come from the data content rather than file names.
Notes
- This setting overrides default load-based sharding of input files. Be cautious as a single partition with heavy data could overload a single shard.
- This setting is only available via the
WITH SETTINGS
syntax, not withSET
.
Syntax
Example
The following examples demonstrate when to use each sharding option:
Setting insert_sharding
to shard_on_read
changes the file distribution strategy across nodes: with this each date is processed by exactly one node, but only when the partition value can be determined from the source file name.
Target tablet size
During ingestion, Firebolt attempts to create optimally sized tablets to balance ingestion speed and future scan performance. When all ingested data has been read, Firebolt prefers creating relatively smaller tablets to prioritize data persistence, leaving further optimization to Auto Vacuum. However, if this behavior isn’t desirable, you can control it using the tablet_min_size_bytes
and tablet_max_size_bytes
settings:
tablet_min_size_bytes
: Controls the minimum size of tablets. If there isn’t enough data in the ingestion, smaller tablets are created nevertheless. When possible, data is compacted into tablets of at least this size. Default:1.5 GiB
. Minmum:1 GiB
.tablet_max_size_bytes
: Controls the maximum size of tablets. Default:4 GiB
. Should be greated or equal thantablet_min_size_bytes
.
Note
Larger target tablet sizes may require more memory during ingestion.
Example
The following example sets both minimum and maximum tablet sizes to 4 GiB
:
Changing both tablet_min_size_bytes
and tablet_max_size_bytes
to 4 GiB
ensures that larger tablets are created.