> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

> Aggregating index overview

# Aggregating index

An aggregating index precomputes a `GROUP BY` with aggregate functions and lets the optimizer answer matching queries from the precomputed result instead of scanning the base table. You write ordinary queries against the base table; the planner recognizes when an index can serve them and rewrites the plan to read the index. You never reference the index by name.

Under the hood it is an ordinary Firebolt table built from the same machinery as any other table: it is partitioned and [sorted and indexed](/performance-and-observability/storage-and-indexing/primary-index) on its grouping keys, and its rows hold **partial aggregation state** rather than final values. For each group, an aggregate is stored as the intermediate state needed to fold in more rows later: a running sum for `SUM`, a count for `COUNT`, the sum and the count together for `AVG`, a sketch for an approximate distinct count. Storing state rather than a finished number is what lets one index serve queries at its own grouping or any coarser grouping: the engine merges the per-group states on read. The index is maintained in the **same transaction** that writes the base table, so a query never sees a stale result.

```text theme={"theme":{"light":"css-variables","dark":"css-variables"}}
Aggregating index internals...
Not a scan-pipeline stage. The optimizer rewrites base-table aggregations to read the AI table instead.
Storage: a regular table, PRIMARY INDEX = grouping keys in declared order, columns store partial aggregate state,
plus an implicit COUNT(*).
Matching (verified against planner): a query matches when (a) its grouping keys are computable from the AI's
grouping keys - same set OR a subset/coarser grouping, (b) its aggregates are a subset of (or derivable from) the
AI's aggregates, (c) any pre-aggregation filter references only AI grouping-key columns. NO requirement to filter on
or select the leading key. Coarser grouping re-aggregates partial state via merge combinators (summerge, countmerge,
HLL merge). Pruning within the AI scan follows normal PRIMARY INDEX prefix rules.
```

## Key features

* **Transparent matching.** You query the base table as usual. The optimizer substitutes the index whenever a query's grouping keys and aggregates can be computed from it, including queries that group more coarsely than the index.
* **Always consistent.** The index is updated in the same DML transaction as the base table, so it is never stale.
* **Automatic `COUNT(*)`.** Every index includes a `COUNT(*)`, added for you if you don't list one.
* **Sorted and indexed on the grouping keys.** The index table's primary index is its grouping keys in declared order. Order them as you would any [primary index](/performance-and-observability/storage-and-indexing/primary-index): most- to least-frequently filtered, low-cardinality first, so reads against the index prune well.

## Syntax

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE AGGREGATING INDEX <index_name>
ON <table_name> (
   <grouping_element>
     [, ...],
     <aggregation_element>
     [, ...]
);
```

### Parameters

| Parameter               | Description                                                                   |
| ----------------------- | ----------------------------------------------------------------------------- |
| `<index_name>`          | The name of the aggregating index.                                            |
| `<table_name>`          | The name of the table on which the index is created.                          |
| `<grouping_element>`    | Expressions specified as grouping keys or dimensions when creating the index. |
| `<aggregation_element>` | Aggregate functions applied to specific expressions.                          |

## Example

This index precomputes per-`(product_id, region)` totals on `sales`:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE AGGREGATING INDEX sales_agg_index
ON sales (product_id, region, SUM(sales_amount), COUNT(DISTINCT order_id));
```

A query that groups more coarsely, on `product_id` alone, and uses only one of the aggregates still matches. The plan reads `sales_agg_index` and merges its partial state instead of scanning `sales`:

<CodeGroup>
  ```sql Query theme={"theme":{"light":"css-variables","dark":"css-variables"}}
  SELECT product_id, SUM(sales_amount)
  FROM sales
  GROUP BY product_id;
  ```

  ```text EXPLAIN theme={"theme":{"light":"css-variables","dark":"css-variables"}}
  [Aggregate] GroupBy: ["@sales_agg_index".product_id]
    Aggregates: [sum_0:   summerge("@sales_agg_index"."sum(sales_amount)"),
                 count_0: countmerge("@sales_agg_index"."count(*)")]
    \_[StoredTable] Name: "@sales_agg_index"
  ```
</CodeGroup>

`summerge` and `countmerge` re-aggregate the stored partial state, and the scan reads `@sales_agg_index` rather than `sales`.

## How matching works

The optimizer can answer a query from an aggregating index when all of the following hold:

* Every grouping key the query needs can be computed from the index's grouping keys. The query can group by the same keys or by a **subset** of them (a coarser grouping); it does not have to use the leading key.
* Every aggregate the query needs is one of the index's aggregates, or derivable from them.
* Any filter applied before the aggregation references only columns that are in the index's grouping keys.

When the query groups more coarsely than the index, Firebolt merges the stored partial state (summing partial sums, merging `COUNT` and HLL states) to produce the coarser result, as the `EXPLAIN` above shows. You do not have to filter on, or even select, the leading grouping key for the index to match. Pruning *within* the index scan then follows the same rule as any primary index: a filter on a prefix of the grouping keys lets the scan skip granules, which is why grouping-key order matters.

Against the `sales_agg_index` above, all of these read the index:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
SELECT product_id, region, SUM(sales_amount), COUNT(DISTINCT order_id) FROM sales GROUP BY product_id, region;
SELECT product_id, SUM(sales_amount) FROM sales GROUP BY product_id;   -- coarser grouping, one aggregate
SELECT region, SUM(sales_amount) FROM sales GROUP BY region;           -- grouping on a non-leading key
SELECT SUM(sales_amount) FROM sales;                                   -- global aggregate, no GROUP BY
SELECT product_id, SUM(sales_amount) FROM sales WHERE region = 'EMEA' GROUP BY product_id;  -- filter on a grouping key
```

A query falls back to scanning the base table when it needs something the index doesn't carry, for example a filter or grouping on a column that is not one of the index's grouping keys (`WHERE sales_amount > 100`), or an aggregate the index doesn't define.

You can group and aggregate over expressions, for example `ROUND(order_value, 2)` or `CASE WHEN LENGTH(product_name) < 100 THEN product_name ELSE 'Other' END`, as long as the query repeats the same expression.

<Note>
  For columns with frequent `DELETE`s, prefer [`COUNTING_HLL_COUNT_DISTINCT`](/reference-sql/functions-reference/aggregation/counting-hll-count-distinct) over `APPROX_COUNT_DISTINCT` in an index. It absorbs deletions without a full recompute.
</Note>

## Observability

Use [`EXPLAIN`](/reference-sql/commands/queries/explain) to confirm the optimizer matched the index. A matched plan reads from the index table (shown as `@<index_name>`) and applies `*merge` aggregates over its partial state, instead of a `read_tablets` scan of the base table.

## Maintenance

**Ingestion overhead.** Maintaining the index adds work to every `INSERT`, `UPDATE`, and `DELETE` on the base table. There is no fixed limit on how many indexes a table should have. Measure ingest cost against the query time saved and keep the indexes that pay for themselves; for read-heavy tables, several can be worthwhile.

**VACUUM.** Auto vacuum keeps index tablets compact in the background. If an index's tablets fragment under heavy batch ingest or frequent base-table mutations and query latency suffers, compact it explicitly by name:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
VACUUM sales_agg_index;
```

## Limitations

* Aggregating indexes do not help queries whose `GROUP BY` structure changes constantly, since the structure is precomputed.
* They do not optimize filters on columns outside the grouping structure.
* `INSERT`, `UPDATE`, and `DELETE` are slower because the index is maintained in the same transaction; large mutations can temporarily affect query performance.
* They are not suited to transactional workloads with frequent row-level updates.
* Index creation conflicts with concurrent `INSERT`, `DELETE`, `UPDATE`, `DROP PARTITION`, or `TRUNCATE` on the base table, so don't modify the base table while the index is being built.
