Vector Search & HNSW Indexes
Vector Search & HNSW Indexes
HeliosDB-Lite provides first-class support for vector data with a dedicated
VECTOR type, HNSW indexing, product quantization, and three distance
operators for similarity search.
Prerequisites
- HeliosDB-Lite v3.5 or later
- Access to the SQL shell (REPL, PostgreSQL wire protocol, or REST API)
Step 1 — Create a Table with a Vector Column
Declare a column with the VECTOR(n) type, where n is the number of
dimensions.
CREATE TABLE documents ( id INT PRIMARY KEY, title TEXT, embedding VECTOR(3));This creates a table where each row stores a 3-dimensional embedding alongside its metadata.
Step 2 — Insert Vector Data
Vector literals are JSON-style arrays of floating-point numbers enclosed in single quotes.
INSERT INTO documents VALUES (1, 'Introduction to AI', '[0.1, 0.8, 0.3]');INSERT INTO documents VALUES (2, 'Database Internals', '[0.9, 0.1, 0.2]');INSERT INTO documents VALUES (3, 'Machine Learning 101', '[0.2, 0.9, 0.4]');INSERT INTO documents VALUES (4, 'Query Optimization', '[0.8, 0.2, 0.1]');INSERT INTO documents VALUES (5, 'Neural Networks', '[0.15, 0.85, 0.35]');HeliosDB validates that the number of dimensions matches the column definition. Inserting a vector with the wrong number of dimensions produces an error:
-- This fails: VECTOR(3) column but only 2 values providedINSERT INTO documents VALUES (6, 'Bad Vector', '[1.0, 2.0]');-- ERROR: Vector dimension mismatchStep 3 — Distance Operators
HeliosDB supports three distance operators for comparing vectors:
| Operator | Distance Metric | Use Case |
|---|---|---|
<-> | L2 (Euclidean) | General-purpose similarity |
<=> | Cosine | Text embeddings, normalized |
<#> | Inner Product | Maximum inner product search |
L2 (Euclidean) Distance
SELECT id, title, embedding <-> '[0.1, 0.8, 0.3]' AS distanceFROM documentsORDER BY distanceLIMIT 3;Expected output:
id | title | distance----+-----------------------+---------- 1 | Introduction to AI | 0.00 5 | Neural Networks | 0.09 3 | Machine Learning 101 | 0.13Cosine Distance
Cosine distance measures the angle between vectors, ignoring magnitude. It is ideal for text embeddings that are normalized to unit length.
SELECT id, title, embedding <=> '[0.1, 0.8, 0.3]' AS cos_distanceFROM documentsORDER BY cos_distanceLIMIT 3;Inner Product
The inner product operator returns the negative inner product (so that smaller values indicate higher similarity, consistent with ORDER BY ASC).
SELECT id, title, embedding <#> '[0.1, 0.8, 0.3]' AS neg_ipFROM documentsORDER BY neg_ipLIMIT 3;Step 4 — Create an HNSW Index
For tables with many rows, a brute-force scan is slow. HNSW (Hierarchical Navigable Small World) indexes provide fast approximate nearest-neighbor lookups.
CREATE INDEX doc_embedding_idx ON documentsUSING hnsw (embedding);Custom HNSW Parameters
Tune index build quality and search speed with parameters:
CREATE INDEX doc_embedding_idx ON documentsUSING hnsw (embedding)WITH (m = 32, ef_construction = 400, ef_search = 200);| Parameter | Default | Description |
|---|---|---|
m | 16 | Max connections per node per layer |
ef_construction | 200 | Beam width during index build (higher = better) |
ef_search | 100 | Beam width during query (higher = more accurate) |
Higher m and ef_construction improve recall at the cost of build time and
memory. Higher ef_search improves query accuracy at the cost of latency.
Step 5 — k-NN Query with Index
Once an HNSW index exists, k-NN queries automatically use it:
SELECT id, title, embedding <-> '[0.15, 0.85, 0.35]' AS distanceFROM documentsORDER BY distanceLIMIT 5;The query planner detects the ORDER BY <distance_operator> LIMIT k pattern
and routes the search through the HNSW index instead of scanning the full
table.
Step 6 — Product Quantization
For very high-dimensional vectors or large datasets, product quantization (PQ) compresses vectors to reduce memory usage while maintaining search quality.
CREATE INDEX doc_pq_idx ON documentsUSING hnsw (embedding)WITH (quantization = 'product', pq_subquantizers = 4, pq_centroids = 256);PQ divides each vector into subvectors, quantizes each independently, and stores compact codes instead of full floating-point vectors.
Step 7 — Multiple Vector Columns
A table can have multiple vector columns with different dimensions:
CREATE TABLE products ( id INT PRIMARY KEY, name TEXT, text_embedding VECTOR(768), image_embedding VECTOR(512));
CREATE INDEX prod_text_idx ON products USING hnsw (text_embedding);CREATE INDEX prod_image_idx ON products USING hnsw (image_embedding);Search each independently:
-- Find products similar by textSELECT name FROM productsORDER BY text_embedding <-> $text_query_vecLIMIT 10;
-- Find products similar by imageSELECT name FROM productsORDER BY image_embedding <=> $image_query_vecLIMIT 10;Tips and Troubleshooting
-
Dimension mismatch: Query vectors must have the same number of dimensions as the column. A
VECTOR(384)column requires query vectors with exactly 384 elements. -
Normalize for cosine: The
<=>operator computes cosine distance. For best results, normalize your embeddings to unit length before inserting. -
Index build time: Building an HNSW index scans all existing rows. For large tables, this can take time. The index is used immediately after creation.
-
Approximate results: HNSW provides approximate nearest neighbors. Increase
ef_searchfor better recall at the cost of higher latency. -
Index size: HNSW indexes store graph edges in addition to vectors. Expect the index to use roughly 1.5-2x the memory of the raw vector data with default
m = 16. -
Configuration defaults: Global HNSW defaults can be set in
heliosdb.tomlunder[vector]:
[vector]default_index_type = "hnsw"hnsw_ef_construction = 200hnsw_m = 16enable_pq = truepq_subvectors = 8pq_bits = 8