System settings - Firebolt Documentation

You can use a SET statement in a SQL script to configure aspects of Firebolt’s system behavior. Each statement is a query in its own right and must be terminated with a semi-colon (;). The SET statement cannot be included in other queries. This topic provides a list of available settings by function.

Setting via `WITH`

You can override settings by appending WITH (<setting_1_name> = <setting_1_value>, ...) to the query. This lets you apply settings directly to specific queries without affecting the entire session.

Example

Instead of:

SET timezone = 'America/Chicago';
SELECT TIMESTAMPTZ '2023-1-29 12:21:49';  --> 2023-01-29 12:21:49-06

You can write:

SELECT TIMESTAMPTZ '2023-1-29 12:21:49' WITH (timezone = 'America/Chicago');  --> 2023-01-29 12:21:49-06

Supported Commands

The WITH clause is supported for the following commands:

Supported Settings

The following settings can be configured using the WITH clause and the SET command:

timezone
standard_conforming_strings
max_result_rows
statement_timeout
cancel_query_on_connection_drop
enable_result_cache
enable_subresult_cache
enable_scan_cache
insert_sharding
tablet_min_size_bytes and tablet_max_size_bytes
cross_region_request_mode
optimizer_mode
enable_storage_statistics
max_insert_threads

The following settings can be configured only using the SET command:

query_label
warmup

Setting the time zone

Use this setting to specify the session time zone. Time zone names are from the Time Zone Database. You can see the list of tz database time zones here. For times in the future, the latest known rule for the given time zone is applied. Firebolt does not support time zone abbreviations, as they cannot account for daylight savings time transitions, and some time zone abbreviations have meant different UTC offsets at different times. The default value of the timezone setting is UTC.

Syntax

SET timezone = '<time_zone>'

Example

The following code example demonstrates how setting the timezone parameter affects the interpretation and conversion of TIMESTAMPTZ values:

SET timezone = 'UTC';
SELECT TIMESTAMPTZ '1996-09-03 11:19:33.123456 Europe/Berlin';  --> 1996-09-03 09:19:33.123456+00
SELECT TIMESTAMPTZ '2023-1-29 6:3:42.7-3:30';  --> 2023-01-29 09:33:42.7+00

SET timezone = 'America/Chicago';
SELECT TIMESTAMPTZ '2023-1-29 12:21:49';  --> 2023-01-29 12:21:49-06
SELECT TIMESTAMPTZ '2023-1-29Z';  --> 2023-01-28 18:00:00-06

Enable parsing for literal strings

If set to true, strings are parsed without escaping, treating backslashes literally. By default, this setting is enabled.

Syntax

SET standard_conforming_strings = [true|false]

Example

The following code example demonstrates how setting standard_conforming_strings affects the interpretation of escape sequences in string literals:

SET standard_conforming_strings = false;
SELECT '\x3132'; --> 132

SET standard_conforming_strings = true;
SELECT '\x3132'; --> \x3132

Statement timeout

Specifies the number of milliseconds a SQL statement is allowed to run. Any SQL statement or query exceeding the specified time is canceled. A value of zero disables the timeout by default.

Syntax

SET statement_timeout = <number_of_milliseconds>;

Example

The following SQL example sets the query timeout to three seconds:

SET statement_timeout = 3000;

Limit the number of result rows

When set to a value greater than zero, this setting limits the number of rows returned by SELECT statements. The query is executed as if an additional LIMIT clause is added to the SQL query. A value of zero or less means that no limit is applied. By default, no limit to the number of result rows is applied.

Syntax

SET max_result_rows = <integer>;

Example

The following queries all return the same result. For the first query, no explicit settings are set:

SELECT * FROM table LIMIT 10000;

SET max_result_rows = 10000;
SELECT * FROM table;

SET max_result_rows = 10000;
SELECT * FROM table LIMIT 20000;

Query cancellation mode on connection drop

Specify how the query should behave when the HTTP connection to Firebolt is dropped, such as when the UI window is closed. For this, you can choose between 3 different modes:

NONE: The query will not be canceled on connection drop
ALL : The query will be canceled on connection drop
TYPE_DEPENDENT: Only queries without side effects will be canceled, such as SELECT.

The default is TYPE_DEPENDENT.

Syntax

SET cancel_query_on_connection_drop = <mode>

Example

The following code example demonstrates how to control query cancellation behavior when a connection drops using none, all, and type_dependent modes for SET cancel_query_on_connection_drop:

SET cancel_query_on_connection_drop = none;
INSERT INTO X [...]
SELECT * FROM X; 

SET cancel_query_on_connection_drop = all;
INSERT INTO X [...]
SELECT * FROM X;  

SET cancel_query_on_connection_drop = type_dependent;
INSERT INTO X [...] 
SELECT * FROM X;

Query labeling and tagging

Use this option to label your query with a custom text. This simplifies query cancellation and retrieving the query status from system tables.

Syntax

SET query_label = '<text>'

Example

The following code example assigns a query label to a query using SET query_label, allowing you to track it in information_schema, engine_running_queries, and information_schema.engine_query_history. It then demonstrates how to retrieve the QUERY_ID for the labeled query and cancel it using CANCEL QUERY:

SET query_label = 'Hello Firebolt';
SELECT * FROM X;  

SET query_label = '';

SELECT query_id, * FROM information_schema.engine_running_queries WHERE query_label = 'Hello Firebolt'
SELECT query_id, * FROM information_schema.engine_query_history WHERE query_label = 'Hello Firebolt'

CANCEL QUERY WHERE query_id = '<retrieved query_id>'

Multi-cluster engine warmup

Use this option to distribute queries across all clusters of an engine, simplifying the process of initializing cached data to a consistent state across all clusters after a START ENGINE or ALTER ENGINE operation. Warmup queries complete after they have run on all clusters of the engine. The queries return an empty result if they succeed on all clusters. If the query fails on any cluster, it returns an error. If multiple errors occur, only one error is returned.

Syntax

SET warmup = true;

Example

The following code example activates the warmup mode so that the query runs on production_table using all clusters of an engine, and returns an empty result upon success:

USE ENGINE multi_cluster_engine;
SET warmup = true;
SELECT checksum(*) FROM production_table;
SET warmup = false;

Result cache

Set enable_result_cache to FALSE to disable the use of Firebolt’s result cache, which is set to TRUE by default. Disabling result cashing can be useful for benchmarking query performance. When enable_result_cache is disabled, resubmitting the same query will recompute the results rather than retrieving them from cache.

Syntax

SET enable_result_cache = [true|false];

Example

The following code example disables the result cache so that no previously cached results are used, and no new cache entries are written:

SET enable_result_cache = false;
SELECT checksum(*) FROM production_table;

Subresult cache

Firebolt implements advanced cross-query optimization that allows SQL queries to reuse intermediate query execution states from previous requests. Subresult caching operates at a semantic level, which allows Firebolt to understand and optimize queries based on the meaning and context of the data rather than solely based on their syntax or structure. This capability allows Firebolt to optimize across different query patterns for improved efficiency. Set enable_subresult_cache to FALSE to disable Firebolt’s subresult caching, which is set to TRUE by default. Disabling subresult caching is generally not recommended, as it can negatively impact query performance, especially for complex workloads. For most benchmarking scenarios, disable the result cache instead, as described in the previous Result cache section. This approach affects only the final result caching while preserving the benefits of subresult optimizations.

Syntax

SET enable_subresult_cache = [true|false];

Example

The following code example disables the subresult cache so no previously cached subresult is used and no new cache entries are written by this query:

SET enable_subresult_cache = false;
SELECT count(*) FROM fact_table INNER JOIN dim_table ON (a = b);

Setting enable_subresult_cache to FALSE disables the use of all cached subresults. In particular, it deactivates two caching mechanisms that normally speed up query runtimes: the use of the MaybeCache operator, which includes the full result cache, and the hash-table cache used by the Join operator.

Scan cache

The scan cache is a feature that caches all columns accessed by a query right after they have been read by the scan of the underlying table. Note that no filters are applied, also any primary indexes will not be used. The cached columns simply represent the entire data for these fields in the table. Since they are in RAM once cached, certain queries / workloads may benefit from this. E.g., if there are many consecutive queries which access the exact same fields, but have changing filters. Set enable_scan_cache to TRUE to enable Firebolt’s scan caching, which is set to FALSE by default. Note that RAM usage may be high when enabling the scan cache and the impact depends a lot on the workload.

Syntax

SET enable_scan_cache = [true|false];

Example

The following code example enables the scan cache. Once enabled, we run several queries which access exactly the same columns, but change the filters. The first query will populate the cache, the following can use the cached columns.

SET enable_scan_cache = false;
SELECT a, b, c FROM production_table WHERE c > 0 LIMIT 10;
SELECT a, b, c FROM production_table WHERE c = 42 LIMIT 10;
SELECT a, b, c FROM production_table WHERE a = 17 AND b LIKE '%string%' LIMIT 10;

Insert sharding

When working with partitioned tables, Firebolt enforces separation of data between tablets: rows of different partitions cannot be stored together in the same tablet. Consider a scenario where you’re ingesting historical data for the last 3 years with date-based partitioning: this could result in around 1000 tablets. For large datasets, a common practice is to scale out for ingestion. However, this creates a challenge: each date might be processed from multiple nodes: for example, for 10 nodes it can result in up to 10,000 tablets, instead of 1,000. This not only slows down data persistence due to increased storage requests but can also degrade query performance. To address this, Firebolt provides controls for partitioned tables ingestion:

insert_sharding='shard_on_read': Use when the partition expression is based on $source_file_name. This allows Firebolt to determine the target partition before reading data and group files of the same partition on the same nodes. This is most effective when your source files are already organized by partition (e.g., files named like data_20240101.csv, data_20240102.csv).
insert_sharding='shuffle_on_write': Use when the partition expression is based on the data itself. In this case, data must be read first to determine partitioning. Just before insertion and after any transformations, the data is re-shuffled for partitions locality. Use this when your partition values come from the data content rather than file names.

Notes

This setting overrides default load-based sharding of input files. Be cautious as a single partition with heavy data could overload a single shard.
This setting is only available via the WITH SETTINGS syntax, not with SET.

Syntax

INSERT INTO ...
WITH (insert_sharding = ['auto'|'no_sharding'|'shard_on_read'|'shuffle_on_write']);

Example

The following examples demonstrate when to use each sharding option:

CREATE TABLE partitioned_table (
  "date" TEXT,
  f0 TEXT,
  f1 TEXT,
  f2 TEXT,
  f3 TEXT,
  f4 TEXT
) PARTITION BY "date";

-- Example 1: Using shard_on_read with files named like payments-incremental_20240101.csv
-- This works because the partition value comes from the file name
INSERT INTO
  partitioned_table
SELECT
  REGEXP_EXTRACT($source_file_name, 'payments-incremental_(\d{8})', '', 1) AS "date",
  *
FROM
  READ_CSV(url => 's3://firebolt-publishing-public/help_center_assets/ledgering_sample/payments-incremental_*')
WITH (insert_sharding = 'shard_on_read');

-- Example 2: Using shuffle_on_write when partition value comes from the data
-- This is necessary because the partition value is only known after reading the data
INSERT INTO
  partitioned_table
SELECT
  f0,  -- date comes from the CSV data
  *
FROM
  READ_CSV(url => 's3://firebolt-publishing-public/help_center_assets/ledgering_sample/payments-incremental_*')
WITH (insert_sharding = 'shuffle_on_write');

Setting insert_sharding to shard_on_read changes the file distribution strategy across nodes: with this each date is processed by exactly one node, but only when the partition value can be determined from the source file name.

Target tablet size

During ingestion, Firebolt attempts to create optimally sized tablets to balance ingestion speed and future scan performance. When all ingested data has been read, Firebolt prefers creating relatively small tablets to prioritize data persistence, leaving further optimization to Auto Vacuum. However, if this behavior isn’t desirable, you can control it using the tablet_min_size_bytes and tablet_max_size_bytes settings:

tablet_min_size_bytes: Controls the minimum size of tablets. If there isn’t enough data in the ingestion, smaller tablets are created nevertheless. When possible, data is compacted into tablets of at least this size. Default: 1.5 GiB. Minmum: 1 GiB.
tablet_max_size_bytes: Controls the maximum size of tablets. Default: 4 GiB. Should be greated or equal than tablet_min_size_bytes.

Note

Larger target tablet sizes may require more memory during ingestion.

Example

The following example sets both minimum and maximum tablet sizes to 4 GiB:

CREATE TABLE playstats (
  "GameID" BIGINT NULL,
  "PlayerID" BIGINT NULL,
  "Timestamp" TEXT NULL,
  "SelectedCar" TEXT NULL,
  "CurrentLevel" BIGINT NULL,
  "CurrentSpeed" BIGINT NULL,
  "CurrentPlayTime" DOUBLE PRECISION NULL,
  "CurrentScore" BIGINT NULL, 
  "Event" TEXT NULL,
  "ErrorCode" TEXT NULL
);

INSERT INTO
  playstats
SELECT
  *
FROM
  READ_PARQUET(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/*.parquet')
WITH (tablet_min_size_bytes = 4294967296, tablet_max_size_bytes = 4294967296);

Changing both tablet_min_size_bytes and tablet_max_size_bytes to 4 GiB ensures that larger tablets are created.

Access Cross-Region Data

By default, Firebolt restricts importing from and exporting to Amazon S3 buckets located outside your engine’s default region to minimize latencies and network costs. For workloads requiring cross-region access, you can enable this functionality using the cross_region_request_mode setting. Firebolt offers three modes:

disabled: The default setting. Any attempt to access a bucket in another region will cause the query to fail.
auto: Recommended for most scenarios when you need to access a bucket in a different region. Firebolt will automatically infer each bucket’s region using the Amazon HeadBucket API and route requests to the appropriate regional endpoint. The region information is cached on the node for one hour. If you’re using a role to authenticate to Amazon S3, it must include ListBuckets permissions.
- For Iceberg queries using vended credentials, the approach is different because these credentials typically do not expose HeadBucket permissions. In this case, Firebolt will first attempt to resolve the region from the REST catalog; if not possible, it will use the pseudo-region aws-global.
enforced: Use when you need to access a bucket in a different region and do not have the permissions for auto mode. Firebolt will use the pseudo-region aws-global for every request. This may lead to higher latencies since AWS needs to resolve the region internally for each request.

Notes

Firebolt currently offers limited cross-region data transfer and will throttle requests if the limit is exceeded. The current limit is 100 GB per hour per node.
This setting is only available via the WITH SETTINGS syntax, not with SET.

Syntax

-- Example 1: COPY FROM
COPY INTO playstats
FROM 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/' 
LIMIT 5
WITH (TYPE = PARQUET, CROSS_REGION_REQUEST_MODE = 'auto');

-- Example 2: COPY TO
COPY (SELECT * FROM playstats)
TO 's3://sink/'
WITH (TYPE = PARQUET, SINGLE_FILE = TRUE, CROSS_REGION_REQUEST_MODE = 'auto');

-- Example 3: Read TVFs
SELECT * FROM read_parquet('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/')
LIMIT 5
WITH (CROSS_REGION_REQUEST_MODE = 'auto');

Changing the optimizer mode

If set to user_guided, the optimizer will disalbe all cost-based optimization rules. This ensures that the relative order of joins and aggregations in the resulting execution plan will follow the sytnatic order defined in the SELECT blocks of your SQL query The default value is automatic.

Syntax

You can use both the SET

SET optimizer_mode = 'user_guided';
SELECT ... ;
SET optimizer_mode = 'automatic';

and the WITH syntax.

SELECT ... WITH (optimizer_mode = 'user_guided');

If you want to limit the scope of this setting to a specific query, we recommend the latter.

Example

The following examples demonstrate how to use the optimizer_mode setting to ensure that in the resulting execution plan, the join between r and s is executed before the join between r and t.

SELECT
  *
FROM
  r JOIN s ON(r.x = s.x) JOIN t ON(r.y = t.y)
WITH (optimizer_mode = 'user_guided');

Using storage manager statistics in the optimizer

You can control the use of storage manager statistics in the optimizer with the enable_storage_statistics session parameter. Per default, storage manager statistics are enabled (the default value is true). If you want to disable them, set the value to false.

Syntax

SET enable_storage_statistics = [true|false];

Example

Consider the following example. When enable_storage_statistics is set to true, the optimizer will use the storage manager statistics to estimate the number of rows in the fact table ft and the dimension tables d1 and d2. The storage manager reports that d2 is much smaller (50 rows) compared to d1 (1000 rows). The optimizer estimates that even after applying the local predicates on these dimension tables, it will be more efficient to join ft with d2 first and then with d1. The join order in this plan will adapt to the size of the underlying tables. For example, if the size of d2 becomes 400 rows, the optimizer will first join ft with d1 and then with d2. However, when enable_storage_statistics is set to false, the optimizer uses the hardcoded default estimates for the fact table ft (100,000,000 rows) and the dimension tables d1 and d2 (100,000 rows). Because the two dimension tables are estimated to be equally large, the join order in this case is fully determined by the number of local predicates. Applying two local predicates on d1 is estimated to produce a smaller estimated row count (20000 rows) than applying one local predicate on d2 (70000 rows). This join order will remain stable even if the underlying table sizes change.

EXPLAIN(LOGICAL, STATISTICS)
SELECT
  *
FROM
  fact_table ft JOIN dim_table1 d1 ON(ft.x = d1.x) JOIN dim_table2 d2 ON(ft.y = d2.y)
WHERE
  d1.a > 4 AND d1.a < 9 AND d2.b > 5;

Limiting max insert threads

Controls the maximum number of threads for INSERT statements, limiting the degree of parallelism for tablet writing operations. This can reduce memory footprint during ingestion.

Syntax

INSERT INTO ... WITH (max_insert_threads = <number_of_threads_per_node>);

Example

Limit insert write operations to a single thread so that only one tablet is written at a time, reducing memory usage by avoiding multiple concurrent buffers:

INSERT INTO t
SELECT * FROM READ_PARQUET(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/playstats/*.parquet')
WITH (max_insert_threads = 1);

What is Firebolt?

Overview

Guides

SQL reference

General reference

API reference

​Setting via WITH

​Example

​Supported Commands

​Supported Settings

​Setting the time zone

​Syntax

​Example

​Enable parsing for literal strings

​Syntax

​Example

​Statement timeout

​Syntax

​Example

​Limit the number of result rows

​Syntax

​Example

​Query cancellation mode on connection drop

​Syntax

​Example

​Query labeling and tagging

​Syntax

​Example

​Multi-cluster engine warmup

​Syntax

​Example

​Result cache

​Syntax

​Example

​Subresult cache

​Syntax

​Example

​Scan cache

​Syntax

​Example

​Insert sharding

​Notes

​Syntax

​Example

​Target tablet size

​Note

​Example

​Access Cross-Region Data

​Notes

​Syntax

​Changing the optimizer mode

​Syntax

​Example

​Using storage manager statistics in the optimizer

​Syntax

​Example

​Limiting max insert threads

​Syntax

​Example

Setting via `WITH`

Example

Supported Commands

Supported Settings

Setting the time zone

Syntax

Example

Enable parsing for literal strings

Syntax

Example

Statement timeout

Syntax

Example

Limit the number of result rows

Syntax

Example

Query cancellation mode on connection drop

Syntax

Example

Query labeling and tagging

Syntax

Example

Multi-cluster engine warmup

Syntax

Example

Result cache

Syntax

Example

Subresult cache

Syntax

Example

Scan cache

Syntax

Example

Insert sharding

Notes

Syntax

Example

Target tablet size

Note

Example

Access Cross-Region Data

Notes

Syntax

Changing the optimizer mode

Syntax

Example

Using storage manager statistics in the optimizer

Syntax

Example

Limiting max insert threads

Syntax

Example