Learn how to create and optimize aggregating indexes in Firebolt for faster query performance.
Aggregating indexes in Firebolt help speed up query performance by precomputing and storing frequent aggregations, reducing the need for expensive runtime calculations. Use aggregating indexes for queries that involve aggregate functions including GROUP BY
, SUM
, and AVG
. Instead of scanning a full table, queries use aggregating indexes to retrieve pre-aggregated results, leading to faster response times. These indexes update automatically when the underlying table changes, ensuring that queries using them run efficiently without manual intervention, and are most beneficial when querying large tables with precomputable aggregations.
Parameter | Description |
---|---|
<index_name> | A unique name for the aggregating index. |
<table_name> | The name of the table on which the index is created. |
<key_expressions> | One or more expressions that will be used for filtering or as aggregation dimensions in queries. |
<aggregations> | Aggregation functions including SUM , COUNT , AVG , applied to table column expressions. |
Firebolt automatically maintains the aggregating index when the table is updated, ensuring it stays optimized for queries.
The following examples show use cases for the following:
You can optimize a frequently run query that counts distinct values, sums the amount
for products, and groups by product_name
, as shown in the following code example:
To optimize the previous query, create an aggregating index on my_table
to precompute aggregations for COUNT(DISTINCT source)
and SUM(AMOUNT)
, grouped by origin_country
and product_name
as follows:
The aggregating index in the previous code example precomputes and stores aggregated values for COUNT(DISTINCT source)
and SUM(amount)
, reducing computation at query time.
In order to use the aggregating index, a query must group by all indexed columns. In the previous code example, you must include origin_country
as a filter expression because aggregating indexes store grouped data hierarchically based on the first column listed. If you don’t include the first grouping column, the entire table is scanned, reducing performance benefits. Filtering on product_name
is optional because all product_name
values are already grouped under each origin_country
. If filtering on origin_country
is not always required, you can define an index on another column. For example, the following code example creates an aggregating index on product_name
:
Then, you can use the aggregating index in a query as follows:
The following code example groups products by name and replaces names longer than 100 characters with ‘Other.’ Then, it counts distinct brands approximately and filters records where origin_country
is ‘USA’ and amount
exceeds 100
:
To optimize the previous query, create an aggregating index on my_table
that precomputes APPROX_COUNT_DISTINCT(brand)
, grouped by origin_country
, amount
, and product_name
as follows:
In order to use the aggregating index, a query must group by all indexed columns in the index. In the previous code example, you must include origin_country
and amount
as filter expressions because aggregating indexes store grouped data hierarchically based on the first columns listed. If you don’t include the first grouping column, the entire table is scanned, reducing performance benefits. The second grouping column amount
is necessary to efficiently filter down results. Without both groupings, Firebolt cannot sufficiently narrow the dataset efficiently.
AVG
with multiple filtersThe following aggregating index precomputes AVG(price)
and COUNT(product_id)
, grouped by category
, region
and price
, allowing efficient filtering and retrieval of aggregates:
The following query retrieves the precomputed AVG(price)
and COUNT(product_id)
while filtering on region
and price
:
The previous code example utilizes the avg_sales_per_category
aggregating index without requiring an explicit reference to it because Firebolt’s query optimizer automatically applies an aggregating index when a query’s GROUP BY
structure matches the index’s grouping structure. If the query optimizer finds a matching structure, it retrieves the precomputed values instead of scanning and computing the raw data.
The following aggregating index precomputes SUM(order_value)
and COUNT(order_id)
, grouped by customer_segment
and ROUND(order_value, 2)
, allowing efficient filtering on the rounded order_value
:
The following code example retrieves precomputed aggregates while filtering on the rounded order_value
:
The previous code example fully utilizes the aggregating index because it groups by all indexed columns: customer_segment
, ROUND(order_value, 2)
, allowing the use of the precomputed index instead of computing rounding at query time.
You can use EXPLAIN to check if Firebolt is applying an aggregating index as shown in the following code example:
If the aggregating index is being used, the query execution plan will show output as follows:
Aggregating Index: my_table_agg_idx1
If the aggregating index is not used, ensure that the query filters on the first grouping column.
GROUP BY
queries for optimal performance.By following these guidelines, you can use aggregating indexes to significantly speed up queries.
GROUP BY
structures.