Lists Firebolt system settings that you can configure using SQL.
SET
statement in a SQL script to configure aspects of Firebolt’s system behavior. Each statement is a query in its own right and must be terminated with a semi-colon (;). The SET
statement cannot be included in other queries. This topic provides a list of available settings by function.
WITH
WITH (<setting_1_name> = <setting_1_value>, ...)
to the query. This lets you apply settings directly to specific queries without affecting the entire session.
WITH
clause is supported for the following commands:
WITH
clause:
timezone
setting is UTC.
TIMESTAMPTZ
values:
true
, strings are parsed without escaping, treating backslashes literally. By default, this setting is enabled.
standard_conforming_strings
affects the interpretation of escape sequences in string literals:
SELECT
statements. The query is executed as if an additional LIMIT
clause is added to the SQL query. A value of zero or less means that no limit is applied. By default, no limit to the number of result rows is applied.
NONE
: The query will not be canceled on connection dropALL
: The query will be canceled on connection dropTYPE_DEPENDENT
: Only queries without side effects will be canceled, such as SELECT
.TYPE_DEPENDENT
.
none
, all
, and type_dependent
modes for SET cancel_query_on_connection_drop
:
SET query_label
, allowing you to track it in information_schema
, engine_running_queries
, and information_schema.engine_query_history
. It then demonstrates how to retrieve the QUERY_ID
for the labeled query and cancel it using CANCEL QUERY
:
START ENGINE
or ALTER ENGINE
operation.
Warmup queries complete after they have run on all clusters of the engine. The queries return an empty result if they succeed on all clusters. If the query fails on any cluster, it returns an error. If multiple errors occur, only one error is returned.
production_table
using all clusters of an engine, and returns an empty result upon success:
enable_result_cache
to FALSE
to disable the use of Firebolt’s result cache, which is set to TRUE
by default. Disabling result cashing can be useful for benchmarking query performance. When enable_result_cache
is disabled, resubmitting the same query will recompute the results rather than retrieving them from cache.
enable_subresult_cache
to FALSE
to disable Firebolt’s subresult caching, which is set to TRUE
by default.
Disabling subresult caching is generally not recommended, as it can negatively impact query performance, especially for complex workloads. For most benchmarking scenarios, disable the result cache instead, as described in the previous Result cache section. This approach affects only the final result caching while preserving the benefits of subresult optimizations.
enable_subresult_cache
to FALSE
disables the use of all cached subresults. In particular, it deactivates two caching mechanisms that normally speed up query runtimes: the use of the MaybeCache
operator, which includes the full result cache, and the hash-table cache used by the Join
operator.
enable_scan_cache
to TRUE
to enable Firebolt’s scan caching, which is set to FALSE
by default. Note that RAM usage may be high when enabling the scan cache and the impact depends a lot on the workload.
insert_sharding='shard_on_read'
: Use when the partition expression is based on $source_file_name
. This allows Firebolt to determine the target partition before reading data and group files of the same partition on the same nodes. This is most effective when your source files are already organized by partition (e.g., files named like data_20240101.csv
, data_20240102.csv
).
insert_sharding='shuffle_on_write'
: Use when the partition expression is based on the data itself. In this case, data must be read first to determine partitioning. Just before insertion and after any transformations, the data is re-shuffled for partitions locality. Use this when your partition values come from the data content rather than file names.
WITH SETTINGS
syntax, not with SET
.insert_sharding
to shard_on_read
changes the file distribution strategy across nodes: with this each date is processed by exactly one node, but only when the partition value can be determined from the source file name.
tablet_min_size_bytes
and tablet_max_size_bytes
settings:
tablet_min_size_bytes
: Controls the minimum size of tablets. If there isn’t enough data in the ingestion, smaller tablets are created nevertheless. When possible, data is compacted into tablets of at least this size. Default: 1.5 GiB
. Minmum: 1 GiB
.tablet_max_size_bytes
: Controls the maximum size of tablets. Default: 4 GiB
. Should be greated or equal than tablet_min_size_bytes
.4 GiB
:
tablet_min_size_bytes
and tablet_max_size_bytes
to 4 GiB
ensures that larger tablets are created.
cross_region_request_mode
setting. Firebolt offers three modes:
disabled
: The default setting. Any attempt to access a bucket in another region will cause the query to fail.auto
: Recommended for most scenarios when you need to access a bucket in a different region. Firebolt will automatically infer each bucket’s region using the Amazon HeadBucket API and route requests to the appropriate regional endpoint. The region information is cached on the node for one hour. If you’re using a role to authenticate to Amazon S3, it must include ListBuckets permissions. Note that Iceberg queries using vended credentials may lack such permissions.enforced
: Use when you need to access a bucket in a different region and do not have the permissions for auto
mode. Firebolt will use the pseudo-region “aws-global” for every request. This may lead to higher latencies since AWS needs to resolve the region internally for each request.WITH SETTINGS
syntax, not with SET
.user_guided
, the optimizer will disalbe all cost-based optimization rules.
This ensures that the relative order of joins and aggregations in the resulting execution plan will follow the sytnatic order defined in the SELECT
blocks of your SQL query
The default value is automatic
.
SET
WITH
syntax.
optimizer_mode
setting to ensure that in the resulting execution plan, the join between r
and s
is executed before the join between r
and t
.
enable_storage_statistics
session parameter.
Per default, storage manager statistics are enabled (the default value is true
).
If you want to disable them, set the value to false
.
enable_storage_statistics
is set to true
, the optimizer will use the storage manager statistics to estimate the number of rows in the fact table ft
and the dimension tables d1
and d2
.
The storage manager reports that d2
is much smaller (50 rows) compared to d1
(1000 rows).
The optimizer estimates that even after applying the local predicates on these dimension tables, it will be more efficient to join ft
with d2
first and then with d1
.
The join order in this plan will adapt to the size of the underlying tables.
For example, if the size of d2
becomes 400 rows, the optimizer will first join ft
with d1
and then with d2
.
However, when enable_storage_statistics
is set to false
, the optimizer uses the hardcoded default estimates for the fact table ft
(100,000,000 rows) and the dimension tables d1
and d2
(100,000 rows).
Because the two dimension tables are estimated to be equally large, the join order in this case is fully determined by the number of local predicates.
Applying two local predicates on d1
is estimated to produce a smaller estimated row count (20000 rows) than applying one local predicate on d2
(70000 rows).
This join order will remain stable even if the underlying table sizes change.
INSERT
statements, limiting the degree of parallelism for tablet writing operations.
This can reduce memory footprint during ingestion.