Overview
A data skipping index lets Firebolt skip granules that are guaranteed to contain no row matching a query’s filter, so they are never read from storage. It is the granule-level pruning step for columns that are not the primary index, stage 3 of the scan pipeline. Firebolt currently supports one kind of data skipping index, theminmax index.
It records the minimum and maximum value of an indexed column (or expression) within each granule (or, with a higher GRANULARITY, each group of granules).
At query time, the engine compares the predicate in your WHERE clause against these precomputed bounds and skips any granule whose [min, max] range cannot contain a matching row.
It is most effective on columns whose values are clustered or roughly ordered within the table, so that filters can rule out large contiguous ranges of granules.
When to use
| Use case | Example |
|---|---|
| Range filter on a non-primary-index column | WHERE amount > 1000000 |
| Equality filter on a clustered column | WHERE status_code = 500 |
| Date / time range scans | WHERE event_date < '2024-01-01' |
BETWEEN and IN predicates | WHERE b BETWEEN 100000 AND 200000 |
| Filtering on a derived value | Index a + b, query WHERE a + b > 50000 |
minmax index gives little benefit when the indexed column’s values are randomly distributed, because every granule then tends to span the whole value range and nothing can be skipped.
How it relates to other pruning techniques
Aminmax index is the granule-level pruning step for non-primary-index columns. It runs after the coarser tablet-level steps have chosen which tablets to open, and refines the work within them. For the full ordering of pruning, read, and filter stages, see the scan pipeline; the table below summarizes how a minmax index relates to the other pruning mechanisms.
| Technique | Scope | Driven by | Applies to | Relationship to a minmax index |
|---|---|---|---|---|
| Tablet pruning | Whole tablets | Per-tablet min/max statistics kept for every column | Any column with a comparison predicate | Runs first, at a coarser scope. Also min/max-based, but per tablet, not the data skipping index. A minmax index then refines what survives, granule by granule. |
| Primary index pruning | Granules | The sorted order of the primary index | The PRIMARY INDEX key columns | Both prune granules, but the primary index is far more effective because data is physically sorted on its keys. A minmax index brings granule pruning to non-sort-key columns, and works best when their values are still naturally clustered (for example, a load timestamp). |
Data skipping (minmax) index | Granules | Stored min/max per index granule | Range / comparison predicates (>, <, =, BETWEEN, IN) on indexed columns or expressions | This index. |
| Inverted index pruning | Granules + rows | Per-token posting lists (Roaring bitmaps) | Exact token membership on TEXT / ARRAY(TEXT) via has_all_tokens | A different access pattern: use it for token/membership filters on text, a minmax index for range and comparison filters. They coexist, each serving the predicates it fits. |
minmax index on another.
Quick start
SQL reference
CREATE INDEX
| Parameter | Description |
|---|---|
<index_name> | Unique name for the index across all tables, indexes, and views in the database. |
<table_name> | The FACT or DIMENSION table to index. External tables are not supported. |
<expression> | A single column reference or scalar expression of a comparable type (see Supported types). |
TYPE | The index type. Only minmax is currently supported. |
GRANULARITY | A positive integer: how many consecutive data granules are summarized by one index granule. See Choosing a granularity. |
SKIP_INDEX, TYPE, GRANULARITY, and minmax are case-insensitive; this reference uses uppercase for the method and parameter keywords to match the other USING forms.
Both TYPE and GRANULARITY are required; supplying any other set of parameters is an error.
You may create multiple data skipping indexes on the same table, but at most one index per expression.
Each index must have a unique name.
Supported types
The indexed expression must resolve to one of the following comparable types (both nullable and non-nullable variants are allowed):INT, BIGINT, REAL, DOUBLE PRECISION, DECIMAL, DATE, TIMESTAMP, TIMESTAMPTZ, TEXT, BOOLEAN, BYTEA.
Expression-based indexing
You can index any scalar expression, not just a bare column. The expression is evaluated once at ingest time and its min/max are stored. At query time the predicate’s expression must match the indexed expression exactly for pruning to apply:Creating an index on a table that already has data
When you create a data skipping index on a table that already contains data, the existing tablets have no index data and are not pruned until they are rewritten. RunVACUUM with REINDEX=TRUE to backfill the index for pre-existing tablets:
DROP INDEX
VACUUM on the table afterward.
A column that is referenced by a data skipping index cannot be dropped while the index exists:Renaming a column does not require dropping the index.
The index follows the rename automatically.
Capabilities
Supported predicates
The optimizer derives pruning bounds for the common comparison and range predicates. For a predicate on an indexed columnb, a granule is kept only when its [min, max] range could still contain a match:
| Predicate | A granule is pruned when |
|---|---|
b > k | max(b) <= k |
b >= k | max(b) < k |
b < k | min(b) >= k |
b <= k | min(b) > k |
b = k | min(b) > k or max(b) < k |
b BETWEEN lo AND hi | max(b) < lo or min(b) > hi |
b IN (k1, k2, ...) | every value’s range falls outside [min(b), max(b)] |
AND each contribute independent pruning conditions.
NULL handling
NULL values never satisfy a comparison predicate.
A granule that contains only NULL values in the indexed column is pruned for predicates such as b > 150, because no non-null value exists that could match.
Coexistence with other index types
Data skipping indexes coexist with all other index types on the same table: primary indexes, aggregating indexes, inverted indexes, and vector search indexes. Each serves a different access pattern, and their pruning combines within a single query. See How it relates to other pruning techniques for how aminmax index fits alongside tablet, primary index, and inverted index pruning.
Choosing a granularity
GRANULARITY controls how many consecutive data granules are summarized by a single index granule (a single min/max pair):
GRANULARITY = 1stores one min/max pair per data granule (~8,192 rows). This gives the finest-grained pruning and the most precise skipping, at the cost of a larger index.- Higher values group more data granules under one min/max pair. The index is smaller, but pruning is coarser: a single matching row anywhere in the group prevents the whole group from being skipped.
GRANULARITY = 1 for the most aggressive pruning.
Increase it only if index size becomes a concern and your filters still prune effectively at a coarser resolution.
The size of a data granule is governed by the table’s
index_granularity (default 8,192 rows).
The data skipping index’s GRANULARITY then sets how many of those granules share one min/max summary.Observability
information_schema.indexes
Useinformation_schema.indexes to list data skipping indexes. They report an index_type of data_skipping:
EXPLAIN (ANALYZE)
UseEXPLAIN (ANALYZE) to confirm the index is used and to see how many granules were pruned. The scan operator gains two annotations and a pruning metric:
| Annotation / metric | Meaning |
|---|---|
[DataSkippingIndexes] | Confirms the optimizer attached the index to the scan. Shows the index name and granularity. |
[DataSkippingConditions] | The derived bound(s) evaluated against the stored min/max columns. |
granules: X/Y | X granules were read out of Y total. |
data skipping index pruned granules: N | N granules were skipped by the index. |
- The predicate’s expression matches the indexed expression exactly.
- The table’s tablets contain index data. Run
VACUUM (REINDEX=TRUE)if the index was created after the data was loaded.
How it works
During ingestion, as each data granule is written, Firebolt scans the indexed expression’s values in that granule and records their minimum and maximum. EveryGRANULARITY data granules, these are flushed as one index granule’s min/max pair and stored alongside the tablet’s data.
At query time, the optimizer rewrites each supported WHERE predicate into an equivalent condition over the stored min/max columns (the [DataSkippingConditions] shown in EXPLAIN).
For every tablet the query reads, the storage layer evaluates those conditions per index granule and skips the granules whose [min, max] range provably cannot contain a match.
Only the surviving granules are decompressed and scanned.
Because the conditions only ever exclude granules that cannot contain a matching row, query results are identical with or without the index.
The index changes only how much data is read.