Overview
Vector search indexes in Firebolt provide low‑latency similarity search over high‑dimensional embeddings using the HNSW algorithm. They are optimized for ANN (approximate nearest neighbor) retrieval and power semantic search, recommendations, and AI‑driven analytics directly in SQL.Vector search indexes return approximate nearest neighbors. Result quality depends on index parameters, data distribution, and query settings.
Key capabilities
- Sub‑second top‑K vector search at scale
- Full ACID consistency with base table data
- In‑memory or disk‑backed serving modes
- Tunable precision/performance via index and query parameters
When to use
- Use vector search indexes when you need fast, scalable top‑k similarity search in SQL without operating a separate vector database.
- Choose them when approximate results are acceptable in exchange for lower latency and cost.
Syntax
Create a vector search index
You can create multiple vector search indexes per table and column. Each index must have a unique name. Indexes can reference the same column with identical or different configurations.Use a vector search index
Use thevector_search() table‑valued function (TVF) and reference the index by name.
Drop a vector search index
Dropping a vector search index viaDROP INDEX <index_name> is a pure metadata operation and will not free up memory on storage level.
In order to free up space on the storage level, VACUUM on the table after the index has been dropped.
Rename a vector search index
To rename a vector search index, useALTER INDEX ... RENAME TO:
How it works
Each table is partitioned into tablets. For every vector search index, Firebolt maintains one vector index file per tablet, built using the USearch library. During a query:- The engine searches each tablet’s vector index for up to
top_kclosest row numbers to<target_vector>. - Results across tablets are merged and ordered by distance to produce the overall top‑K.
- Only the identified base‑table rows are read using semi‑join reduction, avoiding a full table scan.
Performance and observability
Optimizing vector search performance requires tuning both the index and the table layout. For guidance on choosing index and search parameters, see CREATE VECTOR INDEX andvector_search().
Engine sizing
For best latency, ensure the entire vector index fits into memory. Once loaded, index files are cached to serve subsequent queries from RAM. Useinformation_schema.indexes to inspect index sizes and plan engine memory:
The in‑memory vector index cache size is controlled by VECTOR_INDEX_CACHE_MEMORY_FRACTION:
VECTOR_INDEX_CACHE_MEMORY_FRACTION = 0.6 requires ≥ 417 GiB available memory. See engine sizing.
information_schema.engine_query_history or EXPLAIN (ANALYZE). Confirm that the index is served from memory:
vector_search(...) and optionally caching base‑table data using a CHECKSUM scan on relevant columns.
Table index granularity
The table’s index_granularity defines the maximum number of rows per granule, which directly impacts how data is retrieved. The default granularity is 8,192 rows per granule. It is likely that the top K closest vectors are not stored in the same granule despite being semantically close to each other. Therefore, decreasing the index_granularity can improve vector search performance but might cause regressions for other analytical workload on the same table.Index creation on populated tables
Vector search indexes can be created on both empty and populated tables as a pure metadata operation. All data inserted after index creation is automatically indexed as part of the transaction. When you create an index on a populated table, existing data is not automatically indexed. However, that data is still considered when using thevector_search() TVF. During query execution, a hybrid search is performed across both indexed and non‑indexed tablets.
This increases latency because non‑indexed data must be fully scanned.
To backfill existing data after creating an index on a populated table, run VACUUM (REINDEX = TRUE).
Examples
Consider a table storing document embeddings generated by a language model:information_schema.indexes together with other types of indexes (e.g., aggregating indexes).