Overview
Aggregating indexes in Firebolt help speed up query performance by precomputing and storing frequent aggregations, reducing the need for expensive runtime calculations. Use aggregating indexes for queries that involve aggregate functions includingGROUP BY
, SUM
, and AVG
. Instead of scanning a full table, queries use aggregating indexes to retrieve pre-aggregated results, leading to faster response times. These indexes update automatically when the underlying table changes, ensuring that queries using them run efficiently without manual intervention, and are most beneficial when querying large tables with precomputable aggregations.
Syntax
Parameters
Parameter | Description |
---|---|
<index_name> | A unique name for the aggregating index. |
<table_name> | The name of the table on which the index is created. |
<key_expressions> | One or more expressions that will be used for filtering or as aggregation dimensions in queries. |
<aggregations> | Aggregation functions including SUM , COUNT , AVG , applied to table column expressions. |
Firebolt automatically maintains the aggregating index when the table is updated, ensuring it stays optimized for queries.
Examples
The following examples show use cases for the following:- Create a simple aggregating index
- Create a complex aggregating index
- Compute AVG with multiple filters
- Filter by a calculated column
Create a simple aggregating index
You can optimize a frequently run query that counts distinct values, sums theamount
for products, and groups by product_name
, as shown in the following code example:
my_table
to precompute aggregations for COUNT(DISTINCT source)
and SUM(AMOUNT)
, grouped by origin_country
and product_name
as follows:
COUNT(DISTINCT source)
and SUM(amount)
, reducing computation at query time.
In order to use the aggregating index, a query must group by all indexed columns. In the previous code example, you must include origin_country
as a filter expression because aggregating indexes store grouped data hierarchically based on the first column listed. If you don’t include the first grouping column, the entire table is scanned, reducing performance benefits. Filtering on product_name
is optional because all product_name
values are already grouped under each origin_country
. If filtering on origin_country
is not always required, you can define an index on another column. For example, the following code example creates an aggregating index on product_name
:
Create a complex aggregating index
The following code example groups products by name and replaces names longer than 100 characters with ‘Other.’ Then, it counts distinct brands approximately and filters records whereorigin_country
is ‘USA’ and amount
exceeds 100
:
my_table
that precomputes APPROX_COUNT_DISTINCT(brand)
, grouped by origin_country
, amount
, and product_name
as follows:
origin_country
and amount
as filter expressions because aggregating indexes store grouped data hierarchically based on the first columns listed. If you don’t include the first grouping column, the entire table is scanned, reducing performance benefits. The second grouping column amount
is necessary to efficiently filter down results. Without both groupings, Firebolt cannot sufficiently narrow the dataset efficiently.
Compute AVG
with multiple filters
The following aggregating index precomputes AVG(price)
and COUNT(product_id)
, grouped by category
, region
and price
, allowing efficient filtering and retrieval of aggregates:
AVG(price)
and COUNT(product_id)
while filtering on region
and price
:
avg_sales_per_category
aggregating index without requiring an explicit reference to it because Firebolt’s query optimizer automatically applies an aggregating index when a query’s GROUP BY
structure matches the index’s grouping structure. If the query optimizer finds a matching structure, it retrieves the precomputed values instead of scanning and computing the raw data.
Filter by a calculated column
The following aggregating index precomputesSUM(order_value)
and COUNT(order_id)
, grouped by customer_segment
and ROUND(order_value, 2)
, allowing efficient filtering on the rounded order_value
:
order_value
:
customer_segment
, ROUND(order_value, 2)
, allowing the use of the precomputed index instead of computing rounding at query time.
How to check if a query is applying an aggregating index
You can use EXPLAIN to check if Firebolt is applying an aggregating index as shown in the following code example:Aggregating Index: my_table_agg_idx1
If the aggregating index is not used, ensure that the query filters on the first grouping column.
Best practices
- Always include query filter expressions – Queries should filter on indexed columns to ensure the aggregating indexes are applied efficiently.
- Prioritize low-cardinality columns – Generally, place columns with fewer unique values first to improve pruning efficiency, though high-cardinality columns can be beneficial when they have selective filters.
- Match aggregations to queries – Ensure the index includes all measures used in
GROUP BY
queries for optimal performance.
Limitations
- Because the index structure is precomputed, aggregating indexes are not suitable for queries that frequently change
GROUP BY
structures. - Aggregating indexes do not optimize queries that filter on columns that are not part of the grouping structure unless additional indexing strategies are applied.
- Inserts and updates may be slower because of index maintenance, so avoid over-indexing to save storage space. Additionally, large updates can temporarily impact query performance.
- Aggregating indexes are not beneficial for transactional workloads that require frequent row-level updates.
- Aggregating index creation can conflict with other DML operations like INSERT, DELETE, UPDATE, DROP PARTITION, or TRUNCATE applied on its origin table. Make sure the origin table is not modified while creating the index.