Explore how Firebolt supports the data management lifecycle
INSERT INTO VALUES
statement. Typical scenarios where singleton inserts come handy are:
INSERT INTO VALUES
statements separated by a comma) whenever possible. When a batch of INSERT INTO VALUES
statements are executed, Firebolt tries to create a single tablet, which will improve tablet quality and reduce table fragmentation.
To minimize operational overhead and system maintenance that table fragmentation can cause, Firebolt implements a built-in optimization process that merges tablets with suboptimal size. This optimization process is fully autonomous and runs in the background. The background process searches on a periodic basis for suboptimal tablets and merges them while keeping tablet optimal size in mind. In addition, Firebolt supports the VACUUM command that allows users full control to defragment tables on-demand.
Singleton insert statement example
DELETE
query uses a primary key(s) as a predicate, Firebolt leverages primary key information to quickly find tablets and records that are affected by the DELETE operation, leading to performant deletes. Similarly, deleting all data that belongs to one or more partitions is almost instantaneous, as deleting data in these cases is only a metadata operation in Firebolt.
Having frequent deletes in the system could lead to tablet fragmentation, so there’s a background process that continuously and automatically optimizes tablets behind the scenes. For more information, refer to optimizing storage below.
DELETE
followed by an INSERT
operation. This holds true for both singleton as well as bulk update statements. Firebolt supports both simple as well as complex update functionality, including complex predicates and multi-table joins. Similarly, any column defined in a Firebolt table can be updated, including partitioning columns. While updating values for a partitioning column may lead to a longer execution time (depending on the number of records that exist in the partition, as updated data may need to be moved to newly assigned partitions), this functionality simplifies usage and avoids the need for manual workarounds to be done by our users.
Update statement example
information_schema.tables
view provides a number of useful columns, such as the number of rows, compressed and uncompressed table size, among others. For our purposes here, two columns are of interest as fragmentation metrics: 1) number_of_tablets
and 2) fragmentation
. As the name suggests, the number_of_tablets
column projects information about the total number of tablets in each table. The fragmentation column specifies the table fragmentation percentage (number between 0-100). As a best practice and general guidance, it’s recommend to keep fragmentation below 80.
Tables that are fragmented have been identified, now what?
There are two paths that Firebolt provides to tackle fragmentation state:
VACUUM
statement, andVACUUM
functionality leverages the user-defined engine to perform this optimization task, while system-driven behavior runs completely autonomously and uses Firebolt-provided capacity. The column number_of_tablets
can be used to identify the total number of tablets before and after optimization tasks are performed. Similarly, the fragmentation metrics from information_schema.tables
can be used to monitor progress and assess effectiveness of the background process to keep the table in a healthy state.
The background process is recommended due to ease of the management and administration, but both options are available to users to fit their desired behavior and needs.
COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)
, etc. One of the key promises that databases have is that indexes are automatically maintained and updated as the data in the base table change – this is no different with Firebolt. However, data modifications (and specifically deletes) are not the friendliest operations when it comes to maintaining aggregated indexes. Aggregated functions such as COUNT, SUM, AVG
are composable aggregates, and as such can be easily updated/modified as data changes. However, aggregates such as MIN, MAX, COUNT(DISTINCT)
do not fall into that category.