Primary index overview
The Firebolt primary index optimizes data retrieval by organizing it based on column values. This enables efficient data pruning and high-performance queries for large-scale analytics. The primary index ensures that queries target only the most relevant portions of the data, significantly reducing the volume of unnecessary scans. This selectivity is especially powerful when the indexed columns align closely with query patterns, allowing the database to quickly locate and retrieve the required data. As a result, query performance is not only optimized but also remains consistent even as data volumes grow.
Topics:
Customizable indexing:
Tablet-based data organization:
Support for compound indexes:
Sparse indexing:
Automatic metadata updates:
Handling low-cardinality clauses:
Inclusion of join key columns:
Leverage indexed columns directly:
To define a primary index, use the following syntax within a CREATE TABLE
statement:
Parameter | Description |
---|---|
table_name | The name of the table where the primary index is applied. |
column_name1, ... | The columns chosen to be included in the primary index. |
index_granularity | The maximum number of rows in each granule. See Index granularity. |
The following example creates a table with a primary index optimized for query performance by filtering for SubmitDate
and EngineName
:
Non-enforced primary key constraint:
Firebolt does not enforce primary key constraints, so users must manage data integrity externally.
Managing fragmentation:
Fragmentation can occur as you insert, delete, or update data in a table, which impacts storage efficiency and potentially affects your query performance. Firebolt provides tools to help mitigate this effect:
information_schema.tables
to access the fragmentation metric to assess fragmentation levels and determine whether maintenance actions are needed.VACUUM
command to clean up rows flagged for deletion and reorganize fragmented data. It is particularly useful when large numbers of rows have been deleted or updates have introduced significant fragmentation.Query Performance Overhead:
While sparse indexes enable targeted reads and parallel processing to improve query performance, they may still require scanning one tablet range from multiple tablets, even for highly selective filters. This can result in more data being scanned compared to a globally sorted index, potentially affecting performance in certain scenarios.
Column Selection:
Choose columns with high selectivity and relevance to query patterns for optimal performance. Selectivity refers to the ability of a column to significantly narrow down the dataset when filtered, typically measured by the proportion of unique values in the column. Columns with higher selectivity, such as IDs or timestamps, help reduce the number of rows scanned, leading to faster query execution and better resource efficiency.
Using Firebolt’s primary indexes can help you enhance your query performance, optimize data management, and scale efficiently for modern analytics workloads.
The index_granularity
storage parameter,
specified in the WITH
clause, is an advanced setting that may be useful for improving performance in very specific query patterns.
It defines the maximum number of rows per granule, which directly impacts how data is indexed and queried.
A granule is the smallest block of rows that Firebolt can skip or read during query filtering. Index granularity defines the number of rows in each granule. In other words, it sets the smallest group of rows the engine can access independently.
For more information about the fundamentals of Firebolt’s primary indexes and granules, see Firebolt’s blog post on primary indexes.
<index_granularity>
must be a power of 2, ranging from 128 to 8192. The default value is 8192. We recommend using the default value, but lower values can decrease query latency by 10x or more in some query patterns.
Use the default value of index_granularity
, which should translate to good performance for most queries. The following workload patterns may benefit from higher or lower values for index_granularity
:
index_granularity
value can reduce unnecessary row scans and improve efficiency. However, this increases static memory usage for storing the index.index_granularity
value is more efficient, as it reduces index memory usage and overhead introduced by each granule boundary.If you want to adjust index_granularity
, start with the default value, then create duplicate tables with different settings to compare both the query latency and memory usage.
Primary index overview
The Firebolt primary index optimizes data retrieval by organizing it based on column values. This enables efficient data pruning and high-performance queries for large-scale analytics. The primary index ensures that queries target only the most relevant portions of the data, significantly reducing the volume of unnecessary scans. This selectivity is especially powerful when the indexed columns align closely with query patterns, allowing the database to quickly locate and retrieve the required data. As a result, query performance is not only optimized but also remains consistent even as data volumes grow.
Topics:
Customizable indexing:
Tablet-based data organization:
Support for compound indexes:
Sparse indexing:
Automatic metadata updates:
Handling low-cardinality clauses:
Inclusion of join key columns:
Leverage indexed columns directly:
To define a primary index, use the following syntax within a CREATE TABLE
statement:
Parameter | Description |
---|---|
table_name | The name of the table where the primary index is applied. |
column_name1, ... | The columns chosen to be included in the primary index. |
index_granularity | The maximum number of rows in each granule. See Index granularity. |
The following example creates a table with a primary index optimized for query performance by filtering for SubmitDate
and EngineName
:
Non-enforced primary key constraint:
Firebolt does not enforce primary key constraints, so users must manage data integrity externally.
Managing fragmentation:
Fragmentation can occur as you insert, delete, or update data in a table, which impacts storage efficiency and potentially affects your query performance. Firebolt provides tools to help mitigate this effect:
information_schema.tables
to access the fragmentation metric to assess fragmentation levels and determine whether maintenance actions are needed.VACUUM
command to clean up rows flagged for deletion and reorganize fragmented data. It is particularly useful when large numbers of rows have been deleted or updates have introduced significant fragmentation.Query Performance Overhead:
While sparse indexes enable targeted reads and parallel processing to improve query performance, they may still require scanning one tablet range from multiple tablets, even for highly selective filters. This can result in more data being scanned compared to a globally sorted index, potentially affecting performance in certain scenarios.
Column Selection:
Choose columns with high selectivity and relevance to query patterns for optimal performance. Selectivity refers to the ability of a column to significantly narrow down the dataset when filtered, typically measured by the proportion of unique values in the column. Columns with higher selectivity, such as IDs or timestamps, help reduce the number of rows scanned, leading to faster query execution and better resource efficiency.
Using Firebolt’s primary indexes can help you enhance your query performance, optimize data management, and scale efficiently for modern analytics workloads.
The index_granularity
storage parameter,
specified in the WITH
clause, is an advanced setting that may be useful for improving performance in very specific query patterns.
It defines the maximum number of rows per granule, which directly impacts how data is indexed and queried.
A granule is the smallest block of rows that Firebolt can skip or read during query filtering. Index granularity defines the number of rows in each granule. In other words, it sets the smallest group of rows the engine can access independently.
For more information about the fundamentals of Firebolt’s primary indexes and granules, see Firebolt’s blog post on primary indexes.
<index_granularity>
must be a power of 2, ranging from 128 to 8192. The default value is 8192. We recommend using the default value, but lower values can decrease query latency by 10x or more in some query patterns.
Use the default value of index_granularity
, which should translate to good performance for most queries. The following workload patterns may benefit from higher or lower values for index_granularity
:
index_granularity
value can reduce unnecessary row scans and improve efficiency. However, this increases static memory usage for storing the index.index_granularity
value is more efficient, as it reduces index memory usage and overhead introduced by each granule boundary.If you want to adjust index_granularity
, start with the default value, then create duplicate tables with different settings to compare both the query latency and memory usage.