Understand how to organize data from efficient retrieval in Firebolt
CREATE DATABASE
statement and can be modified or deleted using ALTER
and DROP
.
Compute and storage separation
Firebolt separates compute and storage layers, ensuring databases are efficiently managed without impacting processing power.
Data Security
Databases integrate with Firebolt’s data security layers, enforcing access controls and user permissions to safeguard sensitive information.
Performance and integration
test_library
database with an optional description:
IF EXISTS
clause to see if the test_library
database exists, and then deletes it:
books
table stores details about the publication and is ideal for joining with fact tables like borrowedbooks
to analyze borrowing patterns or book popularity.
borrowedbooks
table:
borrowedbooks
table exists and then deletes it:
CASCADE
option to remove all dependencies together.
The following code deletes the borrowedbooks
table and all related objects:
external_books
external table:
PRIMARY INDEX
clause to specify your columns, as shown in the following example:
book_id
and checkout_date
columns.
You can also create a primary index on a dimension table as shown in the following code example:
WHERE
and GROUP BY
clauses, so that Firebolt can use these columns for efficient retrieval. Firebolt physically clusters the data in the table by the columns of the primary index. The primary index significantly improves query performance because it reduces the time spent scanning irrelevant rows and allows the query engine to retrieve only the necessary data.
SUM
, COUNT
, and AVG
, as well as more complex calculations. When the underlying data is changed, Firebolt recalculates aggregate indexes automatically, so that they are always updated when you query them. Firebolt handles the following scenarios:
borrowedbooks
table created in the previous Fact tables section in Firebolt managed-tables:
transaction_id
or late_fee
, it retrieves the precomputed results from the aggregating index, rather than computing them.
The following code example shows how to create an aggregating index to precompute the total late fees accumulated for each book:
SUM
, COUNT
, and AVG
that are run repeatedly.borrower_id
or book_id
from the examples in this section.customer_id
or order_id
from the examples in this section. Avoid creating multiple indexes with overlapping aggregations on the same columns to minimize unnecessary overhead and costs.DATE
or INTEGER
make it easier to reason about and control partition cardinality — such as maintaining one partition per day over the last three years, or one partition per month over the last 80 years. In contrast, variable-length types like STRING
can evolve unpredictably. For example, a product_category
column that starts with a dozen values may grow to thousands of distinct entries over time, unintentionally ballooning the number of partitions and hurting performance.checkout_date
, allowing Firebolt to skip entire months of data when queries filter by date ranges.
To drop an existing partition, use ALTER TABLE ... DROP PARTITION
. For example, to remove all records for January 2023:
WHERE
, JOIN
, or GROUP BY
clauses to minimize the amount of data scanned and improve query efficiency. Generally, low-cardinality columns are preferred for the leading positions in composite primary indexes, but high-cardinality columns can be beneficial when they have selective filters.
If you don’t know how to effectively select a primary index or partition your data, you can use Firebolt’s RECOMMEND_DDL tool to provide automated insights. RECOMMEND_DDL
will make recommendations to optimize database performance by analyzing your query patterns and suggesting the most efficient configurations for primary indexes and partitioning. By examining historical query data, the tool identifies columns frequently used in filtering, grouping, or aggregating operations and recommends appropriate primary indexes and partition keys. These suggestions help reduce the amount of data scanned during queries, enabling faster execution and improved resource utilization.
RECOMMEND_DDL
is particularly useful in complex environments where query patterns evolve over time. By reviewing historical query data, Firebolt identifies columns that are frequently used in filtering or aggregation and recommends appropriate primary index and partitioning strategies.
The following code example uses RECOMMEND_DDL
to analyze query patterns on the books table, created in the **Dimension tables ** section under Firebolt managed-tables, based on queries run in the past week:
books
table is frequently queried based on genre
and book_id
, Firebolt’s RECOMMEND_DDL
command might provide the following example output:
recommended_partition_key | recommended_primary_index | average_pruning_improvement | analyzed_queries |
---|---|---|---|
DATE_TRUNC(‘month’, borrow_date) | book_id, borrower_id | 0.35 | 200 |
recommended_partition_key
suggests partitioning the borrowedbooks
table by month based on the borrow_date
column. The recommended_primary_index
suggests creating a primary index on the book_id
and borrower_id
columns. An average pruning improvement of 0.35 indicates 35% less data will be scanned on average by applying these recommendations. The analyzed queries column shows that 200 queries were analyzed to generate these suggestions.