Learn how to create and optimize aggregating indexes in Firebolt for faster query performance.
GROUP BY
, SUM
, and AVG
. Instead of scanning a full table, queries use aggregating indexes to retrieve pre-aggregated results, leading to faster response times. These indexes update automatically when the underlying table changes, ensuring that queries using them run efficiently without manual intervention, and are most beneficial when querying large tables with precomputable aggregations.
Parameter | Description |
---|---|
<index_name> | A unique name for the aggregating index. |
<table_name> | The name of the table on which the index is created. |
<key_expressions> | One or more expressions that will be used for filtering or as aggregation dimensions in queries. |
<aggregations> | Aggregation functions including SUM , COUNT , AVG , applied to table column expressions. |
amount
for products, and groups by product_name
, as shown in the following code example:
my_table
to precompute aggregations for COUNT(DISTINCT source)
and SUM(AMOUNT)
, grouped by origin_country
and product_name
as follows:
COUNT(DISTINCT source)
and SUM(amount)
, reducing computation at query time.
In order to use the aggregating index, a query must group by all indexed columns. In the previous code example, you must include origin_country
as a filter expression because aggregating indexes store grouped data hierarchically based on the first column listed. If you don’t include the first grouping column, the entire table is scanned, reducing performance benefits. Filtering on product_name
is optional because all product_name
values are already grouped under each origin_country
. If filtering on origin_country
is not always required, you can define an index on another column. For example, the following code example creates an aggregating index on product_name
:
origin_country
is ‘USA’ and amount
exceeds 100
:
my_table
that precomputes APPROX_COUNT_DISTINCT(brand)
, grouped by origin_country
, amount
, and product_name
as follows:
origin_country
and amount
as filter expressions because aggregating indexes store grouped data hierarchically based on the first columns listed. If you don’t include the first grouping column, the entire table is scanned, reducing performance benefits. The second grouping column amount
is necessary to efficiently filter down results. Without both groupings, Firebolt cannot sufficiently narrow the dataset efficiently.
AVG
with multiple filtersAVG(price)
and COUNT(product_id)
, grouped by category
, region
and price
, allowing efficient filtering and retrieval of aggregates:
AVG(price)
and COUNT(product_id)
while filtering on region
and price
:
avg_sales_per_category
aggregating index without requiring an explicit reference to it because Firebolt’s query optimizer automatically applies an aggregating index when a query’s GROUP BY
structure matches the index’s grouping structure. If the query optimizer finds a matching structure, it retrieves the precomputed values instead of scanning and computing the raw data.
SUM(order_value)
and COUNT(order_id)
, grouped by customer_segment
and ROUND(order_value, 2)
, allowing efficient filtering on the rounded order_value
:
order_value
:
customer_segment
, ROUND(order_value, 2)
, allowing the use of the precomputed index instead of computing rounding at query time.
Aggregating Index: my_table_agg_idx1
If the aggregating index is not used, ensure that the query filters on the first grouping column.
GROUP BY
queries for optimal performance.GROUP BY
structures.