Overview
Create a vector search index on an embedding column to enable low‑latency approximate nearest neighbor (ANN) search using the HNSW algorithm. You can create multiple vector search indexes per table and column. Each index must have a unique name. Indexes can reference the same column with identical or different configurations. Read more about Vector Search Indexes in general and how to search using vector search indexes.Syntax
Parameters
| Parameter | Description |
|---|---|
<index_name> | Unique name for the vector search index. |
<table_name> | Target table name. |
<column_name> | Column with embeddings. Supported types: ARRAY(REAL NOT NULL) NOT NULL or ARRAY(DOUBLE NOT NULL) NOT NULL. |
<distance_metric> | Distance operator: vector_cosine_ops, vector_ip_ops, vector_l2sq_ops. See Distance metric. |
<dimension> | Embedding dimensionality. Enforced at ingest. |
m (optional) | HNSW connectivity (max edges per node). Default 16. See M (connectivity). |
ef_construction (optional) | Insert‑time candidate list size. Default 128. See EF_CONSTRUCTION. |
quantization (optional) | Vector storage precision. Default 'bf16'. Supported: 'bf16', 'f16', 'f32', 'f64', 'i8'. See Quantization. |
Distance Metric
Three different distance metrics are available for use in vector search indexes to determine the distance between vectors:vector_cosine_opscosine distancevector_ip_opsinner productvector_l2sq_opssquared L2 (euclidean) distance
M (Connectivity)
TheM parameter during vector search index creation defines the number of edges each vertex in the graph structure has - that is, the number of nearest neighbors that each inserted vector will connect to.
Higher M values improve search quality but increase memory usage and index build time. The impact on index search time is minimal.
- Higher
Mgenerally improves recall and can reduce search hops. - Memory usage and index size increase roughly linearly with
M; build time also increases. - Typical range:
16–64; recommended starting point:16–32. - Large, complex, or high‑dimensional datasets may benefit from higher
M(with higher memory costs).
EF_CONSTRUCTION
Theef_construction parameter defines the quality of inserts into the index. The higher the value, the more nodes will be explored during the insert to find the nearest neighbors, which leads to a higher-quality graph and better recall.
- Higher values improve index quality and recall, at the cost of longer builds and higher transient build memory.
- Typical range:
100–200; recommended starting value:128. - With higher
ef_construction, you may be able to use smallerMandef_searchfor similar recall.
Quantization
Quantization is the process of converting high-precision data into a lower-precision, discrete representation. The quantization setting defines which internal, lower-precision, discrete representation the high-precision input data is converted to (e.g., which data type is used in the index to store the vectors). A smaller data type requires less memory but may impact the quality of the index and thus recall performance. This is particularly relevant when vector clusters are very dense, as precision loss in the floating-point representation will decrease recall. Supported types are:bf16: 16-bit (brain) floating point developed by Google Brain. It is optimized for fast, large-scale numeric tasks where preserving the range is more important than fine-grained precision.f16: 16-bit floating pointf32: 32-bit floating point, equal to the SQL type realf64: 64-bit floating point, equal to the SQL type double precisioni8: 8-bit integer (this quantization is only support with the vector_cosine_ops metric)
Index creation on populated tables
Vector search indexes can be created on both empty and populated tables as a pure metadata operation. All data inserted after index creation is automatically indexed as part of the transaction. When you create an index on a populated table, existing data is not automatically indexed. However, this existing data is still considered when using the index in thevector_search() TVF. During query execution, a hybrid search is performed that accounts for both indexed and non‑indexed tablets. This can impact performance, as non‑indexed data must be fully scanned.
To ensure all existing data is indexed after creating an index on a populated table, run VACUUM ( REINDEX = TRUE).
Examples
Create a table to store document embeddings:embedding column for cosine distance over 256‑dimensional vectors: