GROUP BY with aggregate functions and lets the optimizer answer matching queries from the precomputed result instead of scanning the base table. You write ordinary queries against the base table; the planner recognizes when an index can serve them and rewrites the plan to read the index. You never reference the index by name.
Under the hood it is an ordinary Firebolt table built from the same machinery as any other table: it is partitioned and sorted and indexed on its grouping keys, and its rows hold partial aggregation state rather than final values. For each group, an aggregate is stored as the intermediate state needed to fold in more rows later: a running sum for SUM, a count for COUNT, the sum and the count together for AVG, a sketch for an approximate distinct count. Storing state rather than a finished number is what lets one index serve queries at its own grouping or any coarser grouping: the engine merges the per-group states on read. The index is maintained in the same transaction that writes the base table, so a query never sees a stale result.
Key features
- Transparent matching. You query the base table as usual. The optimizer substitutes the index whenever a query’s grouping keys and aggregates can be computed from it, including queries that group more coarsely than the index.
- Always consistent. The index is updated in the same DML transaction as the base table, so it is never stale.
- Automatic
COUNT(*). Every index includes aCOUNT(*), added for you if you don’t list one. - Sorted and indexed on the grouping keys. The index table’s primary index is its grouping keys in declared order. Order them as you would any primary index: most- to least-frequently filtered, low-cardinality first, so reads against the index prune well.
Syntax
Parameters
| Parameter | Description |
|---|---|
<index_name> | The name of the aggregating index. |
<table_name> | The name of the table on which the index is created. |
<grouping_element> | Expressions specified as grouping keys or dimensions when creating the index. |
<aggregation_element> | Aggregate functions applied to specific expressions. |
Example
This index precomputes per-(product_id, region) totals on sales:
product_id alone, and uses only one of the aggregates still matches. The plan reads sales_agg_index and merges its partial state instead of scanning sales:
summerge and countmerge re-aggregate the stored partial state, and the scan reads @sales_agg_index rather than sales.
How matching works
The optimizer can answer a query from an aggregating index when all of the following hold:- Every grouping key the query needs can be computed from the index’s grouping keys. The query can group by the same keys or by a subset of them (a coarser grouping); it does not have to use the leading key.
- Every aggregate the query needs is one of the index’s aggregates, or derivable from them.
- Any filter applied before the aggregation references only columns that are in the index’s grouping keys.
COUNT and HLL states) to produce the coarser result, as the EXPLAIN above shows. You do not have to filter on, or even select, the leading grouping key for the index to match. Pruning within the index scan then follows the same rule as any primary index: a filter on a prefix of the grouping keys lets the scan skip granules, which is why grouping-key order matters.
Against the sales_agg_index above, all of these read the index:
WHERE sales_amount > 100), or an aggregate the index doesn’t define.
You can group and aggregate over expressions, for example ROUND(order_value, 2) or CASE WHEN LENGTH(product_name) < 100 THEN product_name ELSE 'Other' END, as long as the query repeats the same expression.
For columns with frequent
DELETEs, prefer COUNTING_HLL_COUNT_DISTINCT over APPROX_COUNT_DISTINCT in an index. It absorbs deletions without a full recompute.Observability
UseEXPLAIN to confirm the optimizer matched the index. A matched plan reads from the index table (shown as @<index_name>) and applies *merge aggregates over its partial state, instead of a read_tablets scan of the base table.
Maintenance
Ingestion overhead. Maintaining the index adds work to everyINSERT, UPDATE, and DELETE on the base table. There is no fixed limit on how many indexes a table should have. Measure ingest cost against the query time saved and keep the indexes that pay for themselves; for read-heavy tables, several can be worthwhile.
VACUUM. Auto vacuum keeps index tablets compact in the background. If an index’s tablets fragment under heavy batch ingest or frequent base-table mutations and query latency suffers, compact it explicitly by name:
Limitations
- Aggregating indexes do not help queries whose
GROUP BYstructure changes constantly, since the structure is precomputed. - They do not optimize filters on columns outside the grouping structure.
INSERT,UPDATE, andDELETEare slower because the index is maintained in the same transaction; large mutations can temporarily affect query performance.- They are not suited to transactional workloads with frequent row-level updates.
- Index creation conflicts with concurrent
INSERT,DELETE,UPDATE,DROP PARTITION, orTRUNCATEon the base table, so don’t modify the base table while the index is being built.