Skip to main content

Overview

A data skipping index lets Firebolt skip granules that are guaranteed to contain no row matching a query’s filter, so they are never read from storage. It is the granule-level pruning step for columns that are not the primary index, stage 3 of the scan pipeline. Firebolt currently supports one kind of data skipping index, the minmax index. It records the minimum and maximum value of an indexed column (or expression) within each granule (or, with a higher GRANULARITY, each group of granules). At query time, the engine compares the predicate in your WHERE clause against these precomputed bounds and skips any granule whose [min, max] range cannot contain a matching row. It is most effective on columns whose values are clustered or roughly ordered within the table, so that filters can rule out large contiguous ranges of granules.

When to use

Use caseExample
Range filter on a non-primary-index columnWHERE amount > 1000000
Equality filter on a clustered columnWHERE status_code = 500
Date / time range scansWHERE event_date < '2024-01-01'
BETWEEN and IN predicatesWHERE b BETWEEN 100000 AND 200000
Filtering on a derived valueIndex a + b, query WHERE a + b > 50000
A minmax index gives little benefit when the indexed column’s values are randomly distributed, because every granule then tends to span the whole value range and nothing can be skipped.

How it relates to other pruning techniques

A minmax index is the granule-level pruning step for non-primary-index columns. It runs after the coarser tablet-level steps have chosen which tablets to open, and refines the work within them. For the full ordering of pruning, read, and filter stages, see the scan pipeline; the table below summarizes how a minmax index relates to the other pruning mechanisms.
TechniqueScopeDriven byApplies toRelationship to a minmax index
Tablet pruningWhole tabletsPer-tablet min/max statistics kept for every columnAny column with a comparison predicateRuns first, at a coarser scope. Also min/max-based, but per tablet, not the data skipping index. A minmax index then refines what survives, granule by granule.
Primary index pruningGranulesThe sorted order of the primary indexThe PRIMARY INDEX key columnsBoth prune granules, but the primary index is far more effective because data is physically sorted on its keys. A minmax index brings granule pruning to non-sort-key columns, and works best when their values are still naturally clustered (for example, a load timestamp).
Data skipping (minmax) indexGranulesStored min/max per index granuleRange / comparison predicates (>, <, =, BETWEEN, IN) on indexed columns or expressionsThis index.
Inverted index pruningGranules + rowsPer-token posting lists (Roaring bitmaps)Exact token membership on TEXT / ARRAY(TEXT) via has_all_tokensA different access pattern: use it for token/membership filters on text, a minmax index for range and comparison filters. They coexist, each serving the predicates it fits.
These mechanisms stack: one query can prune whole tablets from statistics, skip granules via the primary index on one column, and skip more via a minmax index on another.

Quick start

-- 1. Create a table
CREATE TABLE sales (
  id     INT,
  region TEXT,
  amount BIGINT
) PRIMARY INDEX id;

-- 2. Create a minmax data skipping index on a non-primary-index column
CREATE INDEX dsi_amount ON sales
  USING SKIP_INDEX(amount) WITH (TYPE = minmax, GRANULARITY = 1);

-- 3. Load data
INSERT INTO sales SELECT i, 'us', i * 10 FROM generate_series(1, 30000) gs(i);

-- 4. Query: granules whose amount range is entirely <= 250000 are skipped
SELECT count(*) FROM sales WHERE amount > 250000;

SQL reference

CREATE INDEX

CREATE INDEX <index_name>
  ON <table_name>
  USING SKIP_INDEX(<expression>)
  WITH (TYPE = minmax, GRANULARITY = <granularity>);
ParameterDescription
<index_name>Unique name for the index across all tables, indexes, and views in the database.
<table_name>The FACT or DIMENSION table to index. External tables are not supported.
<expression>A single column reference or scalar expression of a comparable type (see Supported types).
TYPEThe index type. Only minmax is currently supported.
GRANULARITYA positive integer: how many consecutive data granules are summarized by one index granule. See Choosing a granularity.
The keywords SKIP_INDEX, TYPE, GRANULARITY, and minmax are case-insensitive; this reference uses uppercase for the method and parameter keywords to match the other USING forms. Both TYPE and GRANULARITY are required; supplying any other set of parameters is an error. You may create multiple data skipping indexes on the same table, but at most one index per expression. Each index must have a unique name.
-- One index per column is fine
CREATE INDEX dsi_b ON t USING SKIP_INDEX(b) WITH (TYPE = minmax, GRANULARITY = 1);
CREATE INDEX dsi_c ON t USING SKIP_INDEX(c) WITH (TYPE = minmax, GRANULARITY = 1);

Supported types

The indexed expression must resolve to one of the following comparable types (both nullable and non-nullable variants are allowed): INT, BIGINT, REAL, DOUBLE PRECISION, DECIMAL, DATE, TIMESTAMP, TIMESTAMPTZ, TEXT, BOOLEAN, BYTEA.
ARRAY and STRUCT expressions are not supported and produce an error:
ERROR: Data skipping index expression must have a comparable type
(int, bigint, real, double, decimal, date, timestamp, timestamptz,
string, boolean, bytea). Got: ARRAY(INTEGER)
For token membership on TEXT or ARRAY(TEXT) columns, use an inverted index instead.

Expression-based indexing

You can index any scalar expression, not just a bare column. The expression is evaluated once at ingest time and its min/max are stored. At query time the predicate’s expression must match the indexed expression exactly for pruning to apply:
-- Index a derived value
CREATE INDEX dsi_sum ON t USING SKIP_INDEX(a + b) WITH (TYPE = minmax, GRANULARITY = 1);
SELECT sum(a + b) FROM t WHERE a + b > 50000;   -- pruned

-- Index a function result on a text column c
CREATE INDEX dsi_lower ON t USING SKIP_INDEX(lower(c)) WITH (TYPE = minmax, GRANULARITY = 1);
SELECT * FROM t WHERE lower(c) = 'high';          -- pruned
SELECT * FROM t WHERE c = 'high';                 -- NOT pruned (expression differs)

Creating an index on a table that already has data

When you create a data skipping index on a table that already contains data, the existing tablets have no index data and are not pruned until they are rewritten. Run VACUUM with REINDEX=TRUE to backfill the index for pre-existing tablets:
CREATE INDEX dsi_amount ON sales USING SKIP_INDEX(amount) WITH (TYPE = minmax, GRANULARITY = 1);
VACUUM (REINDEX=TRUE) sales;
All subsequent inserts automatically populate the index.

DROP INDEX

DROP INDEX [IF EXISTS] <index_name>;
Dropping a data skipping index is a metadata operation. To reclaim the physical storage occupied by index data, run VACUUM on the table afterward.
A column that is referenced by a data skipping index cannot be dropped while the index exists:
ERROR: Cannot drop column 'c' because it is indexed by the following one or more
data skipping indexes: dsi_c. Drop the data skipping indexes first.
Renaming a column does not require dropping the index. The index follows the rename automatically.

Capabilities

Supported predicates

The optimizer derives pruning bounds for the common comparison and range predicates. For a predicate on an indexed column b, a granule is kept only when its [min, max] range could still contain a match:
PredicateA granule is pruned when
b > kmax(b) <= k
b >= kmax(b) < k
b < kmin(b) >= k
b <= kmin(b) > k
b = kmin(b) > k or max(b) < k
b BETWEEN lo AND himax(b) < lo or min(b) > hi
b IN (k1, k2, ...)every value’s range falls outside [min(b), max(b)]
Multiple predicates combined with AND each contribute independent pruning conditions.

NULL handling

NULL values never satisfy a comparison predicate. A granule that contains only NULL values in the indexed column is pruned for predicates such as b > 150, because no non-null value exists that could match.

Coexistence with other index types

Data skipping indexes coexist with all other index types on the same table: primary indexes, aggregating indexes, inverted indexes, and vector search indexes. Each serves a different access pattern, and their pruning combines within a single query. See How it relates to other pruning techniques for how a minmax index fits alongside tablet, primary index, and inverted index pruning.

Choosing a granularity

GRANULARITY controls how many consecutive data granules are summarized by a single index granule (a single min/max pair):
  • GRANULARITY = 1 stores one min/max pair per data granule (~8,192 rows). This gives the finest-grained pruning and the most precise skipping, at the cost of a larger index.
  • Higher values group more data granules under one min/max pair. The index is smaller, but pruning is coarser: a single matching row anywhere in the group prevents the whole group from being skipped.
Start with GRANULARITY = 1 for the most aggressive pruning. Increase it only if index size becomes a concern and your filters still prune effectively at a coarser resolution.
The size of a data granule is governed by the table’s index_granularity (default 8,192 rows). The data skipping index’s GRANULARITY then sets how many of those granules share one min/max summary.

Observability

information_schema.indexes

Use information_schema.indexes to list data skipping indexes. They report an index_type of data_skipping:
SELECT index_name, index_type
FROM information_schema.indexes
WHERE table_name = 'sales'
ORDER BY index_name;
index_name TEXT, index_type TEXT
'dsi_amount',    'data_skipping'
'primary_sales', 'primary'

EXPLAIN (ANALYZE)

Use EXPLAIN (ANALYZE) to confirm the index is used and to see how many granules were pruned. The scan operator gains two annotations and a pruning metric:
EXPLAIN (ANALYZE) SELECT * FROM sales WHERE amount > 250000;
\_[3] [TableFuncScan] sales.id: $0.id, sales.region: $0.region, sales.amount: $0.amount
  |   $0 = read_tablets(table_name => sales, tablet)
  |   [DataSkippingIndexes] {index=dsi_amount, granularity=1}
  |   [DataSkippingConditions] ((max_dsi_amount <= 250000) IS DISTINCT FROM TRUE)
  |   [Types]: sales.id: integer null, sales.region: text null, sales.amount: bigint null
  |   [Execution Metrics]: ..., granules: 1/4, data skipping index pruned granules: 3
Annotation / metricMeaning
[DataSkippingIndexes]Confirms the optimizer attached the index to the scan. Shows the index name and granularity.
[DataSkippingConditions]The derived bound(s) evaluated against the stored min/max columns.
granules: X/YX granules were read out of Y total.
data skipping index pruned granules: NN granules were skipped by the index.
If these annotations do not appear, verify that:
  1. The predicate’s expression matches the indexed expression exactly.
  2. The table’s tablets contain index data. Run VACUUM (REINDEX=TRUE) if the index was created after the data was loaded.

How it works

During ingestion, as each data granule is written, Firebolt scans the indexed expression’s values in that granule and records their minimum and maximum. Every GRANULARITY data granules, these are flushed as one index granule’s min/max pair and stored alongside the tablet’s data. At query time, the optimizer rewrites each supported WHERE predicate into an equivalent condition over the stored min/max columns (the [DataSkippingConditions] shown in EXPLAIN). For every tablet the query reads, the storage layer evaluates those conditions per index granule and skips the granules whose [min, max] range provably cannot contain a match. Only the surviving granules are decompressed and scanned. Because the conditions only ever exclude granules that cannot contain a matching row, query results are identical with or without the index. The index changes only how much data is read.