Skip to main content
An aggregating index precomputes a GROUP BY with aggregate functions and lets the optimizer answer matching queries from the precomputed result instead of scanning the base table. You write ordinary queries against the base table; the planner recognizes when an index can serve them and rewrites the plan to read the index. You never reference the index by name. Under the hood it is an ordinary Firebolt table built from the same machinery as any other table: it is partitioned and sorted and indexed on its grouping keys, and its rows hold partial aggregation state rather than final values. For each group, an aggregate is stored as the intermediate state needed to fold in more rows later: a running sum for SUM, a count for COUNT, the sum and the count together for AVG, a sketch for an approximate distinct count. Storing state rather than a finished number is what lets one index serve queries at its own grouping or any coarser grouping: the engine merges the per-group states on read. The index is maintained in the same transaction that writes the base table, so a query never sees a stale result.

Key features

  • Transparent matching. You query the base table as usual. The optimizer substitutes the index whenever a query’s grouping keys and aggregates can be computed from it, including queries that group more coarsely than the index.
  • Always consistent. The index is updated in the same DML transaction as the base table, so it is never stale.
  • Automatic COUNT(*). Every index includes a COUNT(*), added for you if you don’t list one.
  • Sorted and indexed on the grouping keys. The index table’s primary index is its grouping keys in declared order. Order them as you would any primary index: most- to least-frequently filtered, low-cardinality first, so reads against the index prune well.

Syntax

CREATE AGGREGATING INDEX <index_name>
ON <table_name> (
   <grouping_element>
     [, ...],
     <aggregation_element>
     [, ...]
);

Parameters

ParameterDescription
<index_name>The name of the aggregating index.
<table_name>The name of the table on which the index is created.
<grouping_element>Expressions specified as grouping keys or dimensions when creating the index.
<aggregation_element>Aggregate functions applied to specific expressions.

Example

This index precomputes per-(product_id, region) totals on sales:
CREATE AGGREGATING INDEX sales_agg_index
ON sales (product_id, region, SUM(sales_amount), COUNT(DISTINCT order_id));
A query that groups more coarsely, on product_id alone, and uses only one of the aggregates still matches. The plan reads sales_agg_index and merges its partial state instead of scanning sales:
SELECT product_id, SUM(sales_amount)
FROM sales
GROUP BY product_id;
summerge and countmerge re-aggregate the stored partial state, and the scan reads @sales_agg_index rather than sales.

How matching works

The optimizer can answer a query from an aggregating index when all of the following hold:
  • Every grouping key the query needs can be computed from the index’s grouping keys. The query can group by the same keys or by a subset of them (a coarser grouping); it does not have to use the leading key.
  • Every aggregate the query needs is one of the index’s aggregates, or derivable from them.
  • Any filter applied before the aggregation references only columns that are in the index’s grouping keys.
When the query groups more coarsely than the index, Firebolt merges the stored partial state (summing partial sums, merging COUNT and HLL states) to produce the coarser result, as the EXPLAIN above shows. You do not have to filter on, or even select, the leading grouping key for the index to match. Pruning within the index scan then follows the same rule as any primary index: a filter on a prefix of the grouping keys lets the scan skip granules, which is why grouping-key order matters. Against the sales_agg_index above, all of these read the index:
SELECT product_id, region, SUM(sales_amount), COUNT(DISTINCT order_id) FROM sales GROUP BY product_id, region;
SELECT product_id, SUM(sales_amount) FROM sales GROUP BY product_id;   -- coarser grouping, one aggregate
SELECT region, SUM(sales_amount) FROM sales GROUP BY region;           -- grouping on a non-leading key
SELECT SUM(sales_amount) FROM sales;                                   -- global aggregate, no GROUP BY
SELECT product_id, SUM(sales_amount) FROM sales WHERE region = 'EMEA' GROUP BY product_id;  -- filter on a grouping key
A query falls back to scanning the base table when it needs something the index doesn’t carry, for example a filter or grouping on a column that is not one of the index’s grouping keys (WHERE sales_amount > 100), or an aggregate the index doesn’t define. You can group and aggregate over expressions, for example ROUND(order_value, 2) or CASE WHEN LENGTH(product_name) < 100 THEN product_name ELSE 'Other' END, as long as the query repeats the same expression.
For columns with frequent DELETEs, prefer COUNTING_HLL_COUNT_DISTINCT over APPROX_COUNT_DISTINCT in an index. It absorbs deletions without a full recompute.

Observability

Use EXPLAIN to confirm the optimizer matched the index. A matched plan reads from the index table (shown as @<index_name>) and applies *merge aggregates over its partial state, instead of a read_tablets scan of the base table.

Maintenance

Ingestion overhead. Maintaining the index adds work to every INSERT, UPDATE, and DELETE on the base table. There is no fixed limit on how many indexes a table should have. Measure ingest cost against the query time saved and keep the indexes that pay for themselves; for read-heavy tables, several can be worthwhile. VACUUM. Auto vacuum keeps index tablets compact in the background. If an index’s tablets fragment under heavy batch ingest or frequent base-table mutations and query latency suffers, compact it explicitly by name:
VACUUM sales_agg_index;

Limitations

  • Aggregating indexes do not help queries whose GROUP BY structure changes constantly, since the structure is precomputed.
  • They do not optimize filters on columns outside the grouping structure.
  • INSERT, UPDATE, and DELETE are slower because the index is maintained in the same transaction; large mutations can temporarily affect query performance.
  • They are not suited to transactional workloads with frequent row-level updates.
  • Index creation conflicts with concurrent INSERT, DELETE, UPDATE, DROP PARTITION, or TRUNCATE on the base table, so don’t modify the base table while the index is being built.