Data skipping index

Overview

A data skipping index lets Firebolt skip granules that are guaranteed to contain no row matching a query’s filter, so they are never read from storage. It is the granule-level pruning step for columns that are not the primary index, stage 3 of the scan pipeline. Firebolt currently supports one kind of data skipping index, the minmax index. It records the minimum and maximum value of an indexed column (or expression) within each granule (or, with a higher GRANULARITY, each group of granules). At query time, the engine compares the predicate in your WHERE clause against these precomputed bounds and skips any granule whose [min, max] range cannot contain a matching row. It is most effective on columns whose values are clustered or roughly ordered within the table, so that filters can rule out large contiguous ranges of granules.

When to use

Use case	Example
Range filter on a non-primary-index column	`WHERE amount > 1000000`
Equality filter on a clustered column	`WHERE status_code = 500`
Date / time range scans	`WHERE event_date < '2024-01-01'`
`BETWEEN` and `IN` predicates	`WHERE b BETWEEN 100000 AND 200000`
Filtering on a derived value	Index `a + b`, query `WHERE a + b > 50000`

A minmax index gives little benefit when the indexed column’s values are randomly distributed, because every granule then tends to span the whole value range and nothing can be skipped.

How it relates to other pruning techniques

A minmax index is the granule-level pruning step for non-primary-index columns. It runs after the coarser tablet-level steps have chosen which tablets to open, and refines the work within them. For the full ordering of pruning, read, and filter stages, see the scan pipeline; the table below summarizes how a minmax index relates to the other pruning mechanisms.

Technique	Scope	Driven by	Applies to	Relationship to a `minmax` index
Tablet pruning	Whole tablets	Per-tablet min/max statistics kept for every column	Any column with a comparison predicate	Runs first, at a coarser scope. Also min/max-based, but per tablet, not the data skipping index. A `minmax` index then refines what survives, granule by granule.
Primary index pruning	Granules	The sorted order of the primary index	The `PRIMARY INDEX` key columns	Both prune granules, but the primary index is far more effective because data is physically sorted on its keys. A `minmax` index brings granule pruning to non-sort-key columns, and works best when their values are still naturally clustered (for example, a load timestamp).
Data skipping (`minmax`) index	Granules	Stored min/max per index granule	Range / comparison predicates (`>`, `<`, `=`, `BETWEEN`, `IN`) on indexed columns or expressions	This index.
Inverted index pruning	Granules + rows	Per-value posting lists (Roaring bitmaps)	Exact value membership	A different access pattern: row-level membership versus granule-level bounds. Use it for point lookups, value lists, and narrow ranges; a `minmax` index for wide, open-ended, or continuous-type ranges. They coexist, and the planner picks the one that fits each predicate.

These mechanisms stack: one query can prune whole tablets from statistics, skip granules via the primary index on one column, and skip more via a minmax index on another.

Quick start

-- 1. Create a table
CREATE TABLE sales (
  id     INT,
  region TEXT,
  amount BIGINT
) PRIMARY INDEX id;

-- 2. Create a minmax data skipping index on a non-primary-index column
CREATE INDEX dsi_amount ON sales
  USING SKIP_INDEX(amount) WITH (TYPE = minmax, GRANULARITY = 1);

-- 3. Load data
INSERT INTO sales SELECT i, 'us', i * 10 FROM generate_series(1, 30000) gs(i);

-- 4. Query: granules whose amount range is entirely <= 250000 are skipped
SELECT count(*) FROM sales WHERE amount > 250000;

SQL reference

CREATE INDEX

CREATE INDEX <index_name>
  ON <table_name>
  USING SKIP_INDEX(<expression>)
  WITH (TYPE = minmax, GRANULARITY = <granularity>);

Parameter	Description
`<index_name>`	Unique name for the index across all tables, indexes, and views in the database.
`<table_name>`	The `FACT` or `DIMENSION` table to index. External tables are not supported.
`<expression>`	A single column reference or scalar expression of a comparable type (see Supported types).
`TYPE`	The index type. Only `minmax` is currently supported.
`GRANULARITY`	A positive integer: how many consecutive data granules are summarized by one index granule. See Choosing a granularity.

The keywords SKIP_INDEX, TYPE, GRANULARITY, and minmax are case-insensitive; this reference uses uppercase for the method and parameter keywords to match the other USING forms. Both TYPE and GRANULARITY are required; supplying any other set of parameters is an error. You may create multiple data skipping indexes on the same table, but at most one index per expression. Each index must have a unique name.

-- One index per column is fine
CREATE INDEX dsi_b ON t USING SKIP_INDEX(b) WITH (TYPE = minmax, GRANULARITY = 1);
CREATE INDEX dsi_c ON t USING SKIP_INDEX(c) WITH (TYPE = minmax, GRANULARITY = 1);

Supported types

The indexed expression must resolve to one of the following comparable types (both nullable and non-nullable variants are allowed): INT, BIGINT, REAL, DOUBLE PRECISION, DECIMAL, DATE, TIMESTAMP, TIMESTAMPTZ, TEXT, BOOLEAN, BYTEA.

ARRAY and STRUCT expressions are not supported and produce an error:

ERROR: Data skipping index expression must have a comparable type
(int, bigint, real, double, decimal, date, timestamp, timestamptz,
string, boolean, bytea). Got: ARRAY(INTEGER)

For token membership on TEXT or ARRAY(TEXT) columns, use an inverted index instead.

Expression-based indexing

You can index any scalar expression, not just a bare column. The expression is evaluated once at ingest time and its min/max are stored. At query time the predicate’s expression must match the indexed expression exactly for pruning to apply:

-- Index a derived value
CREATE INDEX dsi_sum ON t USING SKIP_INDEX(a + b) WITH (TYPE = minmax, GRANULARITY = 1);
SELECT sum(a + b) FROM t WHERE a + b > 50000;   -- pruned

-- Index a function result on a text column c
CREATE INDEX dsi_lower ON t USING SKIP_INDEX(lower(c)) WITH (TYPE = minmax, GRANULARITY = 1);
SELECT * FROM t WHERE lower(c) = 'high';          -- pruned
SELECT * FROM t WHERE c = 'high';                 -- NOT pruned (expression differs)

Creating an index on a table that already has data

When you create a data skipping index on a table that already contains data, the existing tablets have no index data and are not pruned until they are rewritten. Run VACUUM with REINDEX=TRUE to backfill the index for pre-existing tablets:

CREATE INDEX dsi_amount ON sales USING SKIP_INDEX(amount) WITH (TYPE = minmax, GRANULARITY = 1);
VACUUM (REINDEX=TRUE) sales;

All subsequent inserts automatically populate the index.

DROP INDEX

DROP INDEX [IF EXISTS] <index_name>;

Dropping a data skipping index is a metadata operation. To reclaim the physical storage occupied by index data, run VACUUM on the table afterward.

A column that is referenced by a data skipping index cannot be dropped while the index exists:

ERROR: Cannot drop column 'c' because it is indexed by the following one or more
data skipping indexes: dsi_c. Drop the data skipping indexes first.

Renaming a column does not require dropping the index. The index follows the rename automatically.

Capabilities

Supported predicates

The optimizer derives pruning bounds for the common comparison and range predicates. For a predicate on an indexed column b, a granule is kept only when its [min, max] range could still contain a match:

Predicate	A granule is pruned when
`b > k`	`max(b) <= k`
`b >= k`	`max(b) < k`
`b < k`	`min(b) >= k`
`b <= k`	`min(b) > k`
`b = k`	`min(b) > k` or `max(b) < k`
`b BETWEEN lo AND hi`	`max(b) < lo` or `min(b) > hi`
`b IN (k1, k2, ...)`	every value’s range falls outside `[min(b), max(b)]`

Multiple predicates combined with AND each contribute independent pruning conditions.

NULL handling

NULL values never satisfy a comparison predicate. A granule that contains only NULL values in the indexed column is pruned for predicates such as b > 150, because no non-null value exists that could match.

Coexistence with other index types

Data skipping indexes coexist with all other index types on the same table: primary indexes, aggregating indexes, inverted indexes, and vector search indexes. Each serves a different access pattern, and their pruning combines within a single query. See How it relates to other pruning techniques for how a minmax index fits alongside tablet, primary index, and inverted index pruning.

Choosing a granularity

GRANULARITY controls how many consecutive data granules are summarized by a single index granule (a single min/max pair):

GRANULARITY = 1 stores one min/max pair per data granule (~8,192 rows). This gives the finest-grained pruning and the most precise skipping, at the cost of a larger index.
Higher values group more data granules under one min/max pair. The index is smaller, but pruning is coarser: a single matching row anywhere in the group prevents the whole group from being skipped.

Start with GRANULARITY = 1 for the most aggressive pruning. Increase it only if index size becomes a concern and your filters still prune effectively at a coarser resolution.

The size of a data granule is governed by the table’s index_granularity (default 8,192 rows). The data skipping index’s GRANULARITY then sets how many of those granules share one min/max summary.

Observability

information_schema.indexes

Use information_schema.indexes to list data skipping indexes. They report an index_type of data_skipping:

SELECT index_name, index_type
FROM information_schema.indexes
WHERE table_name = 'sales'
ORDER BY index_name;

index_name TEXT, index_type TEXT
'dsi_amount',    'data_skipping'
'primary_sales', 'primary'

EXPLAIN (ANALYZE)

Use EXPLAIN (ANALYZE) to confirm the index is used and to see how many granules were pruned. The scan operator gains two annotations and a pruning metric:

EXPLAIN (ANALYZE) SELECT * FROM sales WHERE amount > 250000;

\_[3] [TableFuncScan] sales.id: $0.id, sales.region: $0.region, sales.amount: $0.amount
  |   $0 = read_tablets(table_name => sales, tablet)
  |   [DataSkippingIndexes] {index=dsi_amount, granularity=1}
  |   [DataSkippingConditions] ((max_dsi_amount <= 250000) IS DISTINCT FROM TRUE)
  |   [Types]: sales.id: integer null, sales.region: text null, sales.amount: bigint null
  |   [Execution Metrics]: ..., granules: 1/4, data skipping index pruned granules: 3

Annotation / metric	Meaning
`[DataSkippingIndexes]`	Confirms the optimizer attached the index to the scan. Shows the index name and granularity.
`[DataSkippingConditions]`	The derived bound(s) evaluated against the stored min/max columns.
`granules: X/Y`	`X` granules were read out of `Y` total.
`data skipping index pruned granules: N`	`N` granules were skipped by the index.

If these annotations do not appear, verify that:

The predicate’s expression matches the indexed expression exactly.
The table’s tablets contain index data. Run VACUUM (REINDEX=TRUE) if the index was created after the data was loaded.

How it works

During ingestion, as each data granule is written, Firebolt scans the indexed expression’s values in that granule and records their minimum and maximum. Every GRANULARITY data granules, these are flushed as one index granule’s min/max pair and stored alongside the tablet’s data. At query time, the optimizer rewrites each supported WHERE predicate into an equivalent condition over the stored min/max columns (the [DataSkippingConditions] shown in EXPLAIN). For every tablet the query reads, the storage layer evaluates those conditions per index granule and skips the granules whose [min, max] range provably cannot contain a match. Only the surviving granules are decompressed and scanned. Because the conditions only ever exclude granules that cannot contain a matching row, query results are identical with or without the index. The index changes only how much data is read.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Overview

When to use

How it relates to other pruning techniques

Quick start

SQL reference

CREATE INDEX

Supported types

Expression-based indexing

Creating an index on a table that already has data

DROP INDEX

Capabilities

Supported predicates

NULL handling

Coexistence with other index types

Choosing a granularity

Observability

information_schema.indexes

EXPLAIN (ANALYZE)

How it works

​Overview

​When to use

​How it relates to other pruning techniques

​Quick start

​SQL reference

​CREATE INDEX

​Supported types

​Expression-based indexing

​Creating an index on a table that already has data

​DROP INDEX

​Capabilities

​Supported predicates

​NULL handling

​Coexistence with other index types

​Choosing a granularity

​Observability

​information_schema.indexes

​EXPLAIN (ANALYZE)

​How it works

Overview

When to use

How it relates to other pruning techniques

Quick start

SQL reference

CREATE INDEX

Supported types

Expression-based indexing

Creating an index on a table that already has data

DROP INDEX

Capabilities

Supported predicates

NULL handling

Coexistence with other index types

Choosing a granularity

Observability

information_schema.indexes

EXPLAIN (ANALYZE)

How it works