- create a simple table,
- ingest data into it using bulk insert and singleton insert capabilities, and finally,
- modify the data by performing delete and update operations.
Transactions and concurrency
Before getting into how data manipulation and changes work in Firebolt, it is important to first explain transactions and concurrency controls that are implemented within the system. Firebolt treats each request or SQL statement as a new implicit transaction. Transactions guarantee the ACID properties: atomicity (all or nothing), consistency (will not transition to an invalid state), isolation (no interference between concurrent transactions, depending on isolation level, see below), durability (no data loss once committed). Firebolt supports two isolation levels:- DDL (CREATE, ALTER, DROP) and DCL (GRANT, REVOKE) work at serializable isolation level.
- DQL (SELECT) and DML (INSERT, UPDATE, DELETE) work at snapshot isolation level.
Creating tables
Let’s look at an example of how to create a simple table called rankings.
Inserting data
To support loading data from a data lake (Amazon S3) into Firebolt tables, Firebolt provides utilities for loading data in bulk. Users can perform both initial and incremental data ingestion using either a COPY statement or INSERT INTO SELECT FROM <external_table> syntax. In addition, Firebolt supports singleton inserts via aINSERT INTO VALUES
statement. Typical scenarios where singleton inserts come handy are:
- Refreshing tables with small amounts of dimensional data, and
- Implementing slowly changing dimension patterns.
Bulk inserts
In Firebolt, bulk data ingestion utilities are designed to be inherently scalable. There are 4 major stages that each loading process goes through:- Reading data from Amazon S3
- Converting data into tablets and columnstore
- Merging tablets to optimize tablet quality (needed for efficient tablet pruning)
- Uploading tablets into Amazon S3.

Singleton insert
While singleton insert is a supported pattern in Firebolt, using bulk insert utilities is recommended to optimize performance of large data ingestion. Data inserted by a singleton insert statement gets stored within a single tablet. From the tablet quality perspective, this creates a suboptimal situation, and can lead to table fragmentation in cases where many singleton inserts are executed. This in turn can lead to suboptimal query performance. One way to mitigate this situation is to use a mini batch pattern (a collection ofINSERT INTO VALUES
statements separated by a comma) whenever possible. When a batch of INSERT INTO VALUES
statements are executed, Firebolt tries to create a single tablet, which will improve tablet quality and reduce table fragmentation.
To minimize operational overhead and system maintenance that table fragmentation can cause, Firebolt implements a built-in optimization process that merges tablets with suboptimal size. This optimization process is fully autonomous and runs in the background. The background process searches on a periodic basis for suboptimal tablets and merges them while keeping tablet optimal size in mind. In addition, Firebolt supports the VACUUM command that allows users full control to defragment tables on-demand.
Singleton insert statement example
Deleting data
Firebolt supports storing as much data as needed for as long as needed. However, there are situations where data does need to be deleted. Situations like data corrections that occur in the systems of records, or GDPR compliance where a single (or multiple) customer record(s) must be deleted to preserve privacy, have led to support for DELETE statements in Firebolt. Firebolt supports deleting a single record in a table or deleting data in bulk. In addition to the tablet level metadata, each tablet comes with a delete log. As the records get deleted, the delete log is maintained. Given rich tablet-level metadata, only tablets that are affected by the delete operation are touched. In cases whereDELETE
query uses a primary key(s) as a predicate, Firebolt leverages primary key information to quickly find tablets and records that are affected by the DELETE operation, leading to performant deletes. Similarly, deleting all data that belongs to one or more partitions is almost instantaneous, as deleting data in these cases is only a metadata operation in Firebolt.
Having frequent deletes in the system could lead to tablet fragmentation, so there’s a background process that continuously and automatically optimizes tablets behind the scenes. For more information, refer to optimizing storage below.
For optimal performance, leverage primary key(s) for deleting the data whenever possible.
Updating data
In real life, data updates happen often. Individuals regularly update their contact information as their residency changes, prospects change their contact preference for marketing campaigns, an order changes its status as it goes through the fulfillment process, all causing data stored in the data warehouses to be updated. An UPDATE can be represented as a composite operation ofDELETE
followed by an INSERT
operation. This holds true for both singleton as well as bulk update statements. Firebolt supports both simple as well as complex update functionality, including complex predicates and multi-table joins. Similarly, any column defined in a Firebolt table can be updated, including partitioning columns. While updating values for a partitioning column may lead to a longer execution time (depending on the number of records that exist in the partition, as updated data may need to be moved to newly assigned partitions), this functionality simplifies usage and avoids the need for manual workarounds to be done by our users.
Update statement example
Optimizing storage
As mentioned in earlier sections, certain data modification scenarios (deletes/updates) could lead to situations where tables get fragmented. To help spot fragmentation situations and identify tables that are fragmented, Firebolt provides a view called information_schema.tables. Theinformation_schema.tables
view provides a number of useful columns, such as the number of rows, compressed and uncompressed table size, among others. For our purposes here, two columns are of interest as fragmentation metrics: 1) number_of_tablets
and 2) fragmentation
. As the name suggests, the number_of_tablets
column projects information about the total number of tablets in each table. The fragmentation column specifies the table fragmentation percentage (number between 0-100). As a best practice and general guidance, it’s recommend to keep fragmentation below 80.
Tables that are fragmented have been identified, now what?
There are two paths that Firebolt provides to tackle fragmentation state:
- the user-driven option of using a
VACUUM
statement, and - system-driven clean up that runs in the background.
VACUUM
functionality leverages the user-defined engine to perform this optimization task, while system-driven behavior runs completely autonomously and uses Firebolt-provided capacity. The column number_of_tablets
can be used to identify the total number of tablets before and after optimization tasks are performed. Similarly, the fragmentation metrics from information_schema.tables
can be used to monitor progress and assess effectiveness of the background process to keep the table in a healthy state.
The background process is recommended due to ease of the management and administration, but both options are available to users to fit their desired behavior and needs.
Maintaining indexes
Aggregating indexes are one of the key features in Firebolt that help accelerate query performance. Users can create these indexes while using one or more Firebolt supported aggregate functions, such asCOUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)
, etc. One of the key promises that databases have is that indexes are automatically maintained and updated as the data in the base table change – this is no different with Firebolt. However, data modifications (and specifically deletes) are not the friendliest operations when it comes to maintaining aggregated indexes. Aggregated functions such as COUNT, SUM, AVG
are composable aggregates, and as such can be easily updated/modified as data changes. However, aggregates such as MIN, MAX, COUNT(DISTINCT)
do not fall into that category.