Inverted index

Nightly Feature

Overview

Inverted indexes in Firebolt are general-purpose inverted indexes that map discrete values to the rows that contain them. They accelerate queries that filter on exact value membership: a point lookup like customer_id = 90218, a value list like status IN ('error', 'timeout'), or a bounded range like event_date BETWEEN DATE '2026-01-01' AND DATE '2026-01-07'. Once an index exists on a column, the planner uses it automatically. You write ordinary SQL, and =, IN, and bounded range (BETWEEN, >= AND <=) predicates over that column are pushed into the scan with no change to your queries. Index a scalar column (BOOLEAN, INTEGER, BIGINT, NUMERIC, DATE, TIMESTAMP, TIMESTAMPTZ) or a TEXT or ARRAY(TEXT) expression. For ARRAY(TEXT) columns, the has_all_tokens function matches rows whose array contains every value you list. Under the hood, every token’s posting list is stored as a compressed Roaring bitmap, giving the engine a compact, set-operation-friendly representation that scales from a handful of rows to billions. Inverted-index filtering is stage 4 of the scan pipeline: it both prunes granules and supplies a row bitmap that the reader reuses to filter rows.

Inverted indexes are designed for exact value matching, not fuzzy or substring search. If you need substring or pattern-based search, use a full-text search index with n-grams instead.

Key capabilities

Automatic predicate matching: with an index in place, ordinary =, IN, and bounded range (BETWEEN, >= AND <=) predicates over the column are pushed into the scan with no change to your SQL.
Text and scalar columns: index scalar BOOLEAN, INTEGER, BIGINT, NUMERIC, DATE, TIMESTAMP, and TIMESTAMPTZ columns, or TEXT and ARRAY(TEXT) expressions.
Granule pruning: entire granules are skipped when no row in the range matches, cutting I/O.
Row-level filtering: within qualifying granules, only matching rows are passed to downstream operators, cutting CPU work.
Membership semantics: = matches a single value; IN and bounded ranges union posting lists, matching rows that hold any of the values; multi-value array matching intersects them, matching rows that contain all listed values.
Expression-based indexing: index the output of any scalar expression, such as LOWER(tags) or a struct field like event.type.
ARRAY(TEXT) support: each array element is indexed as a separate token, enabling efficient multi-value tag filtering.
Async cloud I/O: index lookups against cloud storage use coroutine-based parallelism to overlap multiple S3 reads, minimizing latency.

When to use

Use case	Example
Point lookup on a scalar column	`WHERE customer_id = 90218`
Value-list filtering	`WHERE status IN ('error', 'timeout', 'rejected')`
Bounded date / integer range	`WHERE event_date BETWEEN DATE '2026-01-01' AND DATE '2026-01-07'`
Case-insensitive matching	Index `LOWER(hostname)`, query `WHERE LOWER(hostname) = 'web-01'`
Struct field filtering	Index `event.category`, query `WHERE event.category = 'click'`
High-cardinality column pruning	Prune tablets for a column with millions of distinct values
Multi-value tag / label filtering	`WHERE has_all_tokens(tags, ARRAY['billing', 'error'])`

Quick start

-- 1. Create a table
CREATE TABLE events (
  id          INT,
  customer_id INT,
  status      TEXT,
  event_date  DATE
) PRIMARY INDEX id;

-- 2. Create an inverted index on each column you filter on
CREATE INDEX idx_customer ON events USING INVERTED_INDEX(customer_id);
CREATE INDEX idx_status   ON events USING INVERTED_INDEX(status);
CREATE INDEX idx_date     ON events USING INVERTED_INDEX(event_date);

-- 3. Load data
INSERT INTO events VALUES
  (1, 90218, 'success', DATE '2026-01-03'),
  (2, 90218, 'timeout', DATE '2026-01-05'),
  (3, 90311, 'success', DATE '2026-01-06'),
  (4, 90218, 'error',   DATE '2026-02-10');

-- 4. Query as usual. The planner uses the indexes automatically, with no
--    change to your SQL:
SELECT id FROM events WHERE customer_id = 90218;
SELECT id FROM events WHERE status IN ('error', 'timeout');
SELECT id FROM events WHERE event_date BETWEEN DATE '2026-01-01' AND DATE '2026-01-07';

The last query returns:

id
----
 1
 2
 3

SQL reference

CREATE INDEX

CREATE INDEX [IF NOT EXISTS] <index_name>
  ON <table_name>
  USING INVERTED_INDEX(<expression>);

Parameter	Description
`<index_name>`	Unique name for the index across all tables, indexes, and views in the database.
`<table_name>`	The table to index.
`<expression>`	A column reference or scalar expression that returns `TEXT`, `ARRAY(TEXT)`, or a supported scalar type (see below).

Supported expression types:

Expression type	Example	Behavior
Plain column	`INVERTED_INDEX(tags)`	Each cell value (or array element) becomes a token.
Scalar function	`INVERTED_INDEX(LOWER(hostname))`	The function is evaluated at ingest time; query predicates must use the same expression.
Struct field	`INVERTED_INDEX(event.category)`	Indexes a nested struct field directly.

You can create multiple inverted indexes on the same table with different expressions. Each index must have a unique name. Supported data types: The tokenizing expression must return one of the following. Equality (=) and IN matching applies to every supported type; bounded range matching (BETWEEN, >= AND <=) applies only to the discrete, unit-step ordered types (INTEGER, BIGINT, DATE, BOOLEAN).

Type	Equality / `IN`	Bounded range
`TEXT`, `ARRAY(TEXT)`	yes	no
`BOOLEAN`	yes	yes
`INTEGER`, `BIGINT`	yes	yes
`DATE`	yes	yes
`NUMERIC`	yes	no
`TIMESTAMP`, `TIMESTAMPTZ`	yes	no

The tokenizing expression must return TEXT, ARRAY(TEXT), or a supported scalar type. Attempting to create an index on an unsupported type results in an error:

ERROR: Inverted index tokenizing expression must return TEXT, ARRAY(TEXT), or a supported scalar type (BOOLEAN, INTEGER, BIGINT, NUMERIC, DATE, TIMESTAMP, TIMESTAMPTZ). Got: REAL

Creating an index on an existing table with data: When you create an inverted index on a table that already contains data, existing tablets do not automatically have index data. Run VACUUM with REINDEX=TRUE to rebuild the index for pre-existing tablets:

CREATE INDEX idx_tags ON my_table USING INVERTED_INDEX(tags);
VACUUM (REINDEX=TRUE) my_table;

All subsequent inserts automatically populate the index.

DROP INDEX

DROP INDEX [IF EXISTS] <index_name>;

Dropping an inverted index is a metadata-only operation. To reclaim the physical storage occupied by index files, run VACUUM on the table after dropping the index.

You cannot drop a table while inverted indexes still depend on it. Either drop the indexes first, or use DROP TABLE ... CASCADE to drop the table and all its dependent indexes in one statement.

Capabilities

Automatic predicate matching

You do not call any special function to use an inverted index. When an index exists on a column, the planner recognizes ordinary comparison predicates against that column and pushes them into the scan, reusing the same posting-list machinery. This happens by default; you write the query you would write without an index.

Predicate	Probe	Matched types
`col = lit`, `lit = col`	Single value, intersect mode (`match=all`)	All supported types
`col IN (lit, ...)`	Union of point lookups (`match=any`)	All supported types
`col BETWEEN lo AND hi`, `col >= lo AND col <= hi`	Bounded range enumerated to a value set, union (`match=any`)	`INTEGER`, `BIGINT`, `DATE`, `BOOLEAN`

CREATE INDEX idx_customer ON orders USING INVERTED_INDEX(customer_id);
CREATE INDEX idx_date     ON orders USING INVERTED_INDEX(order_date);

-- All three use the index automatically, with no special syntax:
SELECT * FROM orders WHERE customer_id = 90218;
SELECT * FROM orders WHERE customer_id IN (90218, 90219, 90220);
SELECT * FROM orders WHERE order_date BETWEEN DATE '2026-01-01' AND DATE '2026-01-07';

Type coercion. Equality matching unifies the literal to the column type, so b = 300 matches a BIGINT index even though 300 is written as an INTEGER. IN semantics. col IN (...) is the union of point lookups: the index reads each value’s posting list and unions them, returning rows that contain any listed value. NULL list elements are dropped (they can never match). Range matching is adaptive. An inverted-index probe is a union of point lookups, so a range belongs on the index only when it covers few enough values to enumerate cheaply. The planner makes that judgment per query: it computes how many discrete values the range spans and routes the predicate to whichever access path wins, with no hint and no tuning.

A small, discrete range over a unit-step ordered type (INTEGER, BIGINT, DATE, BOOLEAN) is enumerated into its exact set of values and probed against the index as a union of point lookups, giving row-level precision. BETWEEN lo AND hi is desugared to col >= lo AND col <= hi, so both spellings (and the reversed lo <= col AND col <= hi) behave identically. Exclusive bounds are normalized to the adjacent inclusive value (col > 9 AND col <= 12 covers 10, 11, 12), a single-point range folds to an equality probe, and an empty range (lo > hi) is dropped before the index is consulted.
A wide range that would expand into a large value set stays off the index. Materializing thousands of point lookups would cost more than it saves, so the planner leaves it to the min/max data-skipping index, which prunes by bound without enumerating a single value. The switchover is automatic and bounded, so planning never blows up on a large range.
An open-ended comparison (col > 5, col < 100, with no bound on the other side) spans an unbounded set and likewise falls to the min/max index.
A range over a type with no natural unit step (NUMERIC, TIMESTAMP, TIMESTAMPTZ) is never enumerated: even a one-second timestamp window spans millions of microsecond ticks. These ranges are served by the min/max index; equality and IN on the same column still use the inverted index.

The result is identical whichever path runs; only the cost differs. The inverted index earns its keep on exact membership and tight ranges, the min/max index on wide and continuous ones, and the planner picks between them so you do not have to.

Granule pruning

When a matched predicate (has_all_tokens, =, IN, or a bounded range) appears in a top-level AND conjunction, the query planner attaches [InvertedIndexFilters] metadata to the read_tablets scan operator. At execution time, the storage layer uses the index to skip entire granules whose posting lists have no matching rows, significantly reducing the amount of data read from disk or cloud storage.

Pushdown is only applied to predicates in a top-level AND conjunction. The index is not used for pruning when the predicate is inside an OR, behind a NOT, or nested in a subquery.

Expression-based indexing

You can index any scalar expression, not just a bare column. The expression is evaluated once at ingest time, and the resulting tokens are stored in the index. At query time, the predicate’s expression must match the indexed expression exactly for the planner to push down the filter:

-- Create an index on the lowercased column
CREATE INDEX idx_lower_host ON logs USING INVERTED_INDEX(LOWER(hostname));

-- Query must use the same expression
SELECT * FROM logs WHERE LOWER(hostname) = 'web-01';

Struct field indexing

You can index a field nested inside a STRUCT column:

CREATE TABLE traces (
  id INT,
  event STRUCT(category TEXT, severity TEXT)
) PRIMARY INDEX id;

CREATE INDEX idx_category ON traces USING INVERTED_INDEX(event.category);

SELECT * FROM traces WHERE event.category = 'click';

Coexistence with other index types

Inverted indexes coexist with all other index types on the same table: primary indexes, aggregating indexes, full-text search indexes, and vector search indexes. Each index type serves a different access pattern, and the planner selects the appropriate one based on the query predicate.

Multi-value matching with has_all_tokens

Equality, IN, and range predicates cover scalar columns automatically. For an ARRAY(TEXT) column, where a single row holds many values, use has_all_tokens to match rows whose array contains every value you list. It is also available on scalar columns when you want to call the index explicitly.

-- Single token
has_all_tokens(<expression>, <token>)

-- Multiple tokens: row matches only if it contains all of them
has_all_tokens(<expression>, ARRAY[<token1>, <token2>, ...])

-- With filter mode
has_all_tokens(<expression>, <token>, filter_mode => '<mode>')

Parameter	Description
`<expression>`	A `TEXT` or `ARRAY(TEXT)` column or expression. Must match the indexed expression exactly for the index to be used.
`<token>` / `ARRAY[...]`	One or more constant string tokens. All supplied tokens must be present in the row for it to match.
`filter_mode`	Optional. `'filter'` (default): prunes granules and filters rows. `'prune_only'`: prunes granules but does not apply a row-level filter, trading precision for speed.

For example, find rows tagged both billing and timeout:

CREATE INDEX idx_tags ON events USING INVERTED_INDEX(tags);

SELECT id FROM events WHERE has_all_tokens(tags, ARRAY['billing', 'timeout']);

Behavior with different column types:

First argument type	Semantics
Scalar (`TEXT`, `INTEGER`, …)	The entire cell value is treated as a single token. A single-token query checks equality; a multi-token query can only match if there is exactly one token and it equals the cell value.
`ARRAY(TEXT)`	Each array element is a separate token. A multi-token query returns `true` when every queried token appears as an element somewhere in the array.

Scalar token typing: for a scalar expression, the token type must match the expression type exactly (modulo nullability). has_all_tokens does not coerce: probing a BIGINT column requires has_all_tokens(b, 300::bigint), not has_all_tokens(b, 300). A NULL scalar token is rejected at planning time, because it has no meaningful index key to probe with. (Automatic = matching does coerce, so plain b = 300 works without a cast.) NULL handling: NULL values in the indexed column never match any token. has_all_tokens(NULL, 'x') returns NULL.

Prune-only mode

By default, has_all_tokens both prunes non-matching granules and applies a row-level filter. When the token is highly selective, the row-level filter may be redundant. Use filter_mode => 'prune_only' to skip the row-level filter and let downstream operators handle any false positives:

SELECT *
FROM events
WHERE has_all_tokens(tags, 'billing', filter_mode => 'prune_only')
  AND id > 100;

In prune-only mode, the has_all_tokens predicate is removed from the Filter node in the query plan. Only the remaining predicates (for example, id > 100) are evaluated per row.

Observability

information_schema.indexes

Use information_schema.indexes to inspect inverted index metadata:

SELECT
  index_name,
  index_type,
  index_definition,
  compressed_bytes,
  number_of_tablets
FROM information_schema.indexes
WHERE index_type = 'inverted_index';

Column	Description
`index_name`	The name of the index.
`index_type`	`'inverted_index'` for inverted indexes.
`index_definition`	The full `CREATE INDEX` statement, including the indexed expression.
`compressed_bytes`	Total on-disk size of all index files across all tablets.
`number_of_tablets`	Number of tablets that contain index data.

EXPLAIN (ANALYZE)

Use EXPLAIN (ANALYZE) to verify that the index is being used and to see how many granules were pruned:

EXPLAIN (ANALYZE)
SELECT id FROM events WHERE customer_id = 90218;

Look for these indicators in the output:

[TableFuncScan] ...
|   $0 = read_tablets(table_name => events, tablet)
|   [InvertedIndexFilters] {index=idx_customer, tokens=[90218], match=all}
|   [Execution Metrics]: output cardinality = ..., granules: 1/4,
|       inverted index pruned granules: 3

Metric	Meaning
`[InvertedIndexFilters]`	Confirms the planner pushed the predicate into the scan. Shows the index name, the queried tokens, and the match mode. There is one entry per matched index.
`match=all` / `match=any`	`all` intersects posting lists (from `has_all_tokens` and `=`); `any` unions them (from `IN` and range predicates).
`granules: X/Y`	`X` granules were read out of `Y` total.
`inverted index pruned granules: N`	`N` granules were skipped because the index determined they contain no matching rows.

A pushed-down =, IN, or range predicate produces the same [InvertedIndexFilters] annotation. For example, WHERE n IN (10, 20) shows {index=t_inv_n, tokens=[10,20], match=any}, and WHERE n BETWEEN 10 AND 12 shows {index=t_inv_n, tokens=[10,11,12], match=any}. If [InvertedIndexFilters] does not appear, verify that:

The predicate’s expression matches the indexed expression exactly.
The predicate is in a top-level AND conjunction (not inside OR or NOT).
For a range predicate, the type is enumerable (INTEGER, BIGINT, DATE, BOOLEAN) and the range is narrow; wide, open-ended, or continuous-type ranges are served by the min/max index instead.

How it works

This section explains the internal architecture for users who want to understand the performance characteristics in depth.

Three-file on-disk layout

Each inverted index is stored as three files per tablet:

<index_name>.idx      Sparse index (maps token prefixes to dict block offsets)
<index_name>.dict     Dictionary (sorted tokens + posting list metadata)
<index_name>.posting  Posting lists (Roaring bitmaps with matching row IDs)

Lookup flow
===========

Query token
  |
  v
.idx (sparse index) -- binary search --> block offset
  |
  v
.dict (dictionary block) -- sequential scan --> posting offset / inline
  |
  v
.posting (Roaring bitmap) -- deserialize --> row ID set
  |
  v
Intersect across tokens --> final row IDs

Sparse index (.idx): A lightweight file that stores the first token of each dictionary block and the corresponding byte offset into the .dict file. The lookup begins with a binary search over these entries to locate the relevant dictionary block. Offsets are delta-encoded (v2) to minimize file size. Dictionary (.dict): Stores all tokens sorted in byte order (for text, this is alphabetical; for scalars, it is the order-preserving key encoding), grouped into blocks of 512 tokens. Each block is individually LZ4-compressed. For each token, the dictionary records its encoding type and a pointer into the posting file. Because blocks are independently compressed, a lookup reads and decompresses only the single block that may contain the target token. Posting (.posting): Contains the actual row ID sets. The encoding varies by posting list size to balance space efficiency with decode speed (see Tiered encoding below).

Tiered encoding

Not all posting lists are the same size, and using a single encoding for all of them would waste either space (for small lists) or CPU (for large lists). The index uses three encoding tiers:

Tier	Condition	Storage location	Format
Inline	≤ 6 row IDs	Directly in the `.dict` block	VarUInt-encoded row IDs
SmallSorted	7 to 12 row IDs	`.posting` file	Delta-encoded VarUInt list
Roaring	> 12 row IDs	`.posting` file	Full Roaring64Map serialization

Inline entries avoid a round-trip to the posting file entirely, because the row IDs are embedded in the dictionary block itself. This is particularly effective for high-cardinality, low-frequency tokens (for example, unique request IDs) where most tokens appear in only a few rows. SmallSorted entries use delta encoding: the first row ID is stored as an absolute value, and each subsequent ID is stored as the difference from the previous one. This produces small VarUInt values that compress well. Roaring entries use the full Roaring64Map serialization, which internally switches between array, bitmap, and run-length containers depending on the density of each 2^16-element chunk. This ensures efficient storage across the full range of posting list sizes.

LZ4 dictionary compression

Each 512-token dictionary block is individually LZ4-compressed before being written to the .dict file. The file stores: VarUInt(compressed_size) || compressed_bytes for each block. This block-level compression scheme has two advantages:

Selective decompression: only the block containing the target token is decompressed, avoiding wasted CPU on unrelated blocks.
High compression ratio: tokens within a block tend to share common prefixes (because they are sorted), and LZ4 exploits this locality effectively.

Async coroutine-based cloud I/O

When index files reside in cloud storage (S3), the engine uses C++ coroutines (folly::coro) to issue non-blocking reads. The lookup flow for a multi-token query proceeds as follows:

Read .idx file: a single small async read retrieves the entire sparse index.
Locate dictionary blocks: binary search (CPU-only, no I/O) finds the block offset for each token.
Read dictionary blocks in parallel: one async read per block, all launched concurrently.
Read posting entries in parallel: for non-inline tokens, the exact byte range of each posting list is fetched concurrently.
Eager intersection: as each bitmap arrives (in completion order, not submission order), it is immediately intersected with the running result. If the intersection becomes empty, all remaining in-flight I/O is cancelled.

Each async read suspends the calling coroutine until the buffer manager completes the I/O, allowing the thread to service other coroutines in the meantime. This achieves high throughput without dedicating a thread per I/O. Local caching: When index files are fully primed in the local layered storage (disk cache), the engine switches to POSIX-based reads, bypassing the S3 path entirely. This provides the lowest possible latency for repeated queries against the same tablets.

Write path

During data ingestion (INSERT), the index is built incrementally:

Tokenization: for each row, the indexed expression is evaluated and tokens are extracted. For scalar columns, each cell value is a single token. For ARRAY(TEXT) columns, each array element is a separate token.
In-memory accumulation: tokens are collected in memory with their row IDs. To bound memory, accumulated entries are flushed to Roaring bitmaps when they reach 512 entries.
Bitmap optimization: before serialization, all bitmaps are run-optimized and shrunk to minimize on-disk size.
Sorted serialization: tokens are sorted in byte order, then the posting file, dictionary, and sparse index are written in a single pass.

The write path is designed so that row IDs arrive in sorted order within each vector, enabling efficient addMany batch insertions into the Roaring bitmaps without per-element overhead.

Merge and VACUUM

When tablets are merged (via VACUUM or automatic background compaction), the inverted index is rebuilt from scratch for the merged tablet. If you create an inverted index on a table that already contains data, the existing tablets do not have index data until you run VACUUM (REINDEX=TRUE), which reconstructs the index for all tablets.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Overview

Key capabilities

When to use

Quick start

SQL reference

CREATE INDEX

DROP INDEX

Capabilities

Automatic predicate matching

Granule pruning

Expression-based indexing

Struct field indexing

Coexistence with other index types

Multi-value matching with has_all_tokens

Prune-only mode

Observability

information_schema.indexes

EXPLAIN (ANALYZE)

How it works

Three-file on-disk layout

Tiered encoding

LZ4 dictionary compression

Async coroutine-based cloud I/O

Write path

Merge and VACUUM

​Overview

​Key capabilities

​When to use

​Quick start

​SQL reference

​CREATE INDEX

​DROP INDEX

​Capabilities

​Automatic predicate matching

​Granule pruning

​Expression-based indexing

​Struct field indexing

​Coexistence with other index types

​Multi-value matching with has_all_tokens

​Prune-only mode

​Observability

​information_schema.indexes

​EXPLAIN (ANALYZE)

​How it works

​Three-file on-disk layout

​Tiered encoding

​LZ4 dictionary compression

​Async coroutine-based cloud I/O

​Write path

​Merge and VACUUM

Overview

Key capabilities

When to use

Quick start

SQL reference

CREATE INDEX

DROP INDEX

Capabilities

Automatic predicate matching

Granule pruning

Expression-based indexing

Struct field indexing

Coexistence with other index types

Multi-value matching with has_all_tokens

Prune-only mode

Observability

information_schema.indexes

EXPLAIN (ANALYZE)

How it works

Three-file on-disk layout

Tiered encoding

LZ4 dictionary compression

Async coroutine-based cloud I/O

Write path

Merge and VACUUM