> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

> Primary index overview

# Primary index

The primary index defines the sort order of a table's rows within each [tablet](/performance-and-observability/storage-and-indexing#tablet). Firebolt sorts the rows on these columns and builds a sparse index over them, one key sample per [granule](/performance-and-observability/storage-and-indexing#granule), so it prunes a scan to a range of granules by binary-searching that index. This is [stage 2 of the scan pipeline](/performance-and-observability/storage-and-indexing#stage-2-primary-index-pruning), and the most effective pruning available when your filters align with the index. Unlike a B-tree primary key in a transactional database, it is sparse (one entry per granule, not per row) and is not a uniqueness constraint, so it stays small enough to keep in memory for tables with billions of rows.

The index is **per tablet**, not a single global structure, so locating the matching granules is a binary search within each tablet, on the order of `log2(rows_in_tablet / 8192)` comparisons. Tablet pruning rules out whole tablets first using per-tablet min/max metadata, so a selective lookup binary-searches only the few tablets that can hold the value, and the pruning overhead stays negligible even at billions of rows.

```text theme={"theme":{"light":"css-variables","dark":"css-variables"}}
Firebolt storage and scan pipeline reference (self-contained)...
Architecture: object storage (S3) holds immutable tablets; each node caches blocks on local SSD (LRU) and prefetches.
Tablet = immutable file, rows sorted in PRIMARY INDEX order, auto-vacuumed toward tablet_max_size_bytes (4 GiB default).
Granule = smallest readable block, 8192 rows by default (index_granularity, power of 2 in [128,8192]); sparse PI stores 1 key per granule.
Deletes/updates: no in-place rewrite. One cumulative per-tablet Roaring deletion mask (merge-on-read): a new delete starts from the current mask and adds rows, so a tablet has exactly one active mask, folded in at read time, never OR-ed; prior versions kept for snapshot isolation. Empty tablet is dropped as metadata. UPDATE = delete old row + insert new row (lands in a new tablet).
Roaring bitmaps back both deletion masks and inverted-index posting lists.
Scan pipeline (each step only removes tablets/granules/rows; exact, except approximate vector search which is a separate path):
  [1] tablet pruning     partition key + per-tablet min/max         -> drop whole tablets
  [2] primary index      binary search over sparse marks            -> drop granule ranges
  [3] data skipping      stored min/max per index granule (minmax)  -> drop granules
  [4] inverted/text      row-level Roaring posting lists            -> drop granules + produce a row filter
  [5] seed selection     deletion mask intersect inverted filter  -> initial live-row set per granule
  [6] refine + read      iterative predicate pushdown, most-selective first, lazy column reads -> narrow, read survivors
  [7] output
  [8] late materialization (plan-level, above the scan; rank on key cols, fetch wide payload for top-k finalists only)
join pruning (see query-planning/join-pruning): a join build side becomes a runtime pruning set (make_pruning_set) pushed into the probe-side scan; prunes tablets/granules like WHERE key IN (...); inner/right/semi joins; adaptive: the set is built at run time and discarded with no penalty if the build side is not selective, so it is applied freely.
This page is stage 2. The primary index is an exact sort order plus a sparse index, never approximate "clustering".
Composite-key gotcha: a key column prunes only while every preceding key column is constant within a granule.
Order keys most- to least-frequently filtered; put low-cardinality columns first to form long constant runs.
```

## Key features

* **Any column, no uniqueness constraint.** A primary index can cover any column or set of columns. It is not a key constraint and is never enforced, so you choose it purely to match how you query, not to model identity.
* **Composite keys are order-sensitive.** In a multi-column index, a key column prunes only while every column before it is constant within a granule. Order the keys from most- to least-frequently filtered, and generally place lower-cardinality columns first so they form long constant runs that let later columns prune. Join and foreign keys are good candidates in fact tables.
* **Filter on the keys directly.** Pruning applies only when a predicate references a key column without wrapping it in a transformation: `WHERE k = 5` prunes, `WHERE k + 1 = 6` does not. To prune on a derived value, add a [data skipping index](/performance-and-observability/storage-and-indexing/data-skipping-index) on the expression.
* **Maintained automatically.** Firebolt keeps the index current across inserts, updates, and deletes. There is nothing to rebuild by hand.

## Syntax

To define a primary index, use the following syntax within a `CREATE TABLE` statement:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE TABLE <table_name> (
   <column1> <data_type>
   [, <column2> <data_type>,
   ...]
)
PRIMARY INDEX <column_name1>[, <column_name2>, ...]
[WITH ( index_granularity = <index_granularity> ) ];
```

## Parameters

| Parameter           | Description                                                                                                 |
| ------------------- | ----------------------------------------------------------------------------------------------------------- |
| `table_name`        | The name of the table where the primary index is applied.                                                   |
| `column_name1, ...` | The columns chosen to be included in the primary index.                                                     |
| `index_granularity` | The maximum number of rows in each granule. See [Index granularity](#advanced-option%3A-index-granularity). |

## Example

The following example creates a table with a primary index optimized for query performance by filtering for `SubmitDate` and `EngineName`:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
CREATE [FACT|DIMENSION] TABLE QueryHistory (
  QueryID TEXT,
  QueryText TEXT,
  SubmitDate DATE,
  EngineName TEXT,
  SubmitTime DATE,
  Latency INT
)
PRIMARY INDEX SubmitDate, EngineName;
```

## Considerations

* **Non-enforced primary key constraint**:\
  Firebolt does not enforce primary key constraints, so users must manage data integrity externally.

* **Managing fragmentation**:\
  Fragmentation can occur as you insert, delete, or update data in a table, which impacts storage efficiency and potentially affects your query performance. Firebolt provides tools to help mitigate this effect:
  * **Efficient deletion management**:\
    Instead of immediately removing rows from the table, Firebolt uses a deletion mask vector to flag rows as deleted. This vector marks rows for exclusion during queries while keeping the underlying data intact until cleanup is performed.\
    This approach ensures consistency and avoids disrupting the primary index during updates or deletions.
  * **fragmentation metric**:\
    Use the `information_schema.tables` to access the fragmentation metric to assess fragmentation levels and determine whether maintenance actions are needed.
  * **[VACUUM](/reference-sql/commands/data-management/vacuum) command**:\
    You can use the `VACUUM` command to clean up rows flagged for deletion and reorganize fragmented data. It is particularly useful when large numbers of rows have been deleted or updates have introduced significant fragmentation.

* **Query Performance Overhead**:\
  While sparse indexes enable targeted reads and parallel processing to improve query performance, they may still require scanning one tablet range from multiple tablets, even for highly selective filters. This can result in more data being scanned compared to a globally sorted index, potentially affecting performance in certain scenarios.

* **Column Selection**:\
  Choose columns with high selectivity and relevance to query patterns. **Selectivity** is how much a column narrows the dataset when filtered, roughly the proportion of distinct values in the column. Higher-selectivity columns such as IDs or timestamps cut the number of rows scanned the most.

## Advanced option: index granularity

The `index_granularity` [storage parameter](/reference-sql/commands/data-definition/create-fact-dimension-table#storage-parameters),
specified in the `WITH` clause, is an advanced setting that may be useful for improving performance in very specific query patterns.
It defines the maximum number of rows per granule, which directly impacts how data is indexed and queried.

### How index granularity works

Index granularity sets the maximum number of rows in each [granule](/performance-and-observability/storage-and-indexing#granule), the smallest block of rows the engine can skip or read independently.

* **Lower index granularity** creates smaller granules, allowing more precise filtering and reducing unnecessary row scans in selective queries. However, lower index granularity also increases memory usage and overhead from managing more granules.
* **Higher index granularity values** create larger granules, lowering memory usage and management overhead but increasing the chance of scanning irrelevant rows, especially in selective queries.

### Accepted values

`<index_granularity>` must be a power of 2, ranging from 128 to 8192. The default value is 8192. We recommend using the default value, but lower values can decrease query latency by 10x or more in some query patterns.

### Best practices

Use the default value of `index_granularity`, which should translate to good performance for most queries.  The following workload patterns may benefit from higher or lower values for `index_granularity`:

* If your queries access only a few rows per granule, such as single-row queries or individual rows spread throughout a table, setting a **lower** `index_granularity` value can reduce unnecessary row scans and improve efficiency. However, this increases static memory usage for storing the index.
* If most of your queries scan large portions of the table, such as a large bounded range of primary index columns, a **higher** `index_granularity` value is more efficient, as it reduces index memory usage and overhead introduced by each granule boundary.

If you want to adjust `index_granularity`, start with the default value, then create duplicate tables with different settings to compare both the query latency and memory usage.

## Altering primary indexes

You can modify or remove the primary index of an existing table using `ALTER TABLE` statements. These operations only change the table metadata definition without modifying existing tablets. New ingested data will use the updated primary index definition, while existing tablets retain their original structure until a `VACUUM (UPGRADE=true)` operation is performed, thus affecting query performance.

### Syntax

To change the primary index columns:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
ALTER TABLE <table_name> SET PRIMARY INDEX ( <column_name1>[, <column_name2>, ...] );
```

To remove the primary index entirely:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
ALTER TABLE <table_name> DROP PRIMARY INDEX;
```

### Parameters

| Parameter           | Description                                                                           |
| ------------------- | ------------------------------------------------------------------------------------- |
| `table_name`        | The name of the table to modify.                                                      |
| `column_name1, ...` | The columns to include in the new primary index. All columns must exist in the table. |

### Examples

Change to a single column primary index:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
ALTER TABLE sales_data SET PRIMARY INDEX (customer_id);
```

Change to a multi-column primary index:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
ALTER TABLE sales_data SET PRIMARY INDEX (customer_id, order_date);
```

Remove the primary index:

```sql theme={"theme":{"light":"css-variables","dark":"css-variables"}}
ALTER TABLE sales_data DROP PRIMARY INDEX;
```

### Restrictions and considerations

* **Table compatibility**: Only supported on FACT and DIMENSION tables. External tables are not supported.

* **Tablet metadata compatibility**: If your table contains older tablets that don't have the primary index information in their metadata headers, you must run `VACUUM (UPGRADE=true)` on the table before altering the primary index.

* **Text Index compatibility**: Tables with Text Indexes cannot have their primary index altered. You must drop all Text Indexes before modifying the primary index.

* **Column validation**: All specified columns must exist in the table.

* **Future data impact**: New data inserted after the ALTER TABLE operation will use the updated primary index definition for optimal query performance.

<Note>
  The ALTER TABLE operation only updates the table's metadata definition. Existing data in tablets retains the original primary index structure until `VACUUM (UPGRADE=true)` is run.
  The `VACUUM (UPGRADE=true)` operation will reorganize and merge tablets to align with the new primary index definition, ensuring optimal query performance.
</Note>

### Useful Links

For additional information, see:

* [ALTER TABLE SET PRIMARY INDEX](/reference-sql/commands/data-definition/alter-table#alter-table-set-primary-index) reference page.
* [VACUUM](/reference-sql/commands/data-management/vacuum) reference page.
