Skip to content

Vector Search & HNSW Indexes

Vector Search & HNSW Indexes

HeliosDB-Lite provides first-class support for vector data with a dedicated VECTOR type, HNSW indexing, product quantization, and three distance operators for similarity search.

Prerequisites

  • HeliosDB-Lite v3.5 or later
  • Access to the SQL shell (REPL, PostgreSQL wire protocol, or REST API)

Step 1 — Create a Table with a Vector Column

Declare a column with the VECTOR(n) type, where n is the number of dimensions.

CREATE TABLE documents (
id INT PRIMARY KEY,
title TEXT,
embedding VECTOR(3)
);

This creates a table where each row stores a 3-dimensional embedding alongside its metadata.

Step 2 — Insert Vector Data

Vector literals are JSON-style arrays of floating-point numbers enclosed in single quotes.

INSERT INTO documents VALUES (1, 'Introduction to AI', '[0.1, 0.8, 0.3]');
INSERT INTO documents VALUES (2, 'Database Internals', '[0.9, 0.1, 0.2]');
INSERT INTO documents VALUES (3, 'Machine Learning 101', '[0.2, 0.9, 0.4]');
INSERT INTO documents VALUES (4, 'Query Optimization', '[0.8, 0.2, 0.1]');
INSERT INTO documents VALUES (5, 'Neural Networks', '[0.15, 0.85, 0.35]');

HeliosDB validates that the number of dimensions matches the column definition. Inserting a vector with the wrong number of dimensions produces an error:

-- This fails: VECTOR(3) column but only 2 values provided
INSERT INTO documents VALUES (6, 'Bad Vector', '[1.0, 2.0]');
-- ERROR: Vector dimension mismatch

Step 3 — Distance Operators

HeliosDB supports three distance operators for comparing vectors:

OperatorDistance MetricUse Case
<->L2 (Euclidean)General-purpose similarity
<=>CosineText embeddings, normalized
<#>Inner ProductMaximum inner product search

L2 (Euclidean) Distance

SELECT id, title, embedding <-> '[0.1, 0.8, 0.3]' AS distance
FROM documents
ORDER BY distance
LIMIT 3;

Expected output:

id | title | distance
----+-----------------------+----------
1 | Introduction to AI | 0.00
5 | Neural Networks | 0.09
3 | Machine Learning 101 | 0.13

Cosine Distance

Cosine distance measures the angle between vectors, ignoring magnitude. It is ideal for text embeddings that are normalized to unit length.

SELECT id, title, embedding <=> '[0.1, 0.8, 0.3]' AS cos_distance
FROM documents
ORDER BY cos_distance
LIMIT 3;

Inner Product

The inner product operator returns the negative inner product (so that smaller values indicate higher similarity, consistent with ORDER BY ASC).

SELECT id, title, embedding <#> '[0.1, 0.8, 0.3]' AS neg_ip
FROM documents
ORDER BY neg_ip
LIMIT 3;

Step 4 — Create an HNSW Index

For tables with many rows, a brute-force scan is slow. HNSW (Hierarchical Navigable Small World) indexes provide fast approximate nearest-neighbor lookups.

CREATE INDEX doc_embedding_idx ON documents
USING hnsw (embedding);

Custom HNSW Parameters

Tune index build quality and search speed with parameters:

CREATE INDEX doc_embedding_idx ON documents
USING hnsw (embedding)
WITH (m = 32, ef_construction = 400, ef_search = 200);
ParameterDefaultDescription
m16Max connections per node per layer
ef_construction200Beam width during index build (higher = better)
ef_search100Beam width during query (higher = more accurate)

Higher m and ef_construction improve recall at the cost of build time and memory. Higher ef_search improves query accuracy at the cost of latency.

Step 5 — k-NN Query with Index

Once an HNSW index exists, k-NN queries automatically use it:

SELECT id, title, embedding <-> '[0.15, 0.85, 0.35]' AS distance
FROM documents
ORDER BY distance
LIMIT 5;

The query planner detects the ORDER BY <distance_operator> LIMIT k pattern and routes the search through the HNSW index instead of scanning the full table.

Step 6 — Product Quantization

For very high-dimensional vectors or large datasets, product quantization (PQ) compresses vectors to reduce memory usage while maintaining search quality.

CREATE INDEX doc_pq_idx ON documents
USING hnsw (embedding)
WITH (quantization = 'product', pq_subquantizers = 4, pq_centroids = 256);

PQ divides each vector into subvectors, quantizes each independently, and stores compact codes instead of full floating-point vectors.

Step 7 — Multiple Vector Columns

A table can have multiple vector columns with different dimensions:

CREATE TABLE products (
id INT PRIMARY KEY,
name TEXT,
text_embedding VECTOR(768),
image_embedding VECTOR(512)
);
CREATE INDEX prod_text_idx ON products USING hnsw (text_embedding);
CREATE INDEX prod_image_idx ON products USING hnsw (image_embedding);

Search each independently:

-- Find products similar by text
SELECT name FROM products
ORDER BY text_embedding <-> $text_query_vec
LIMIT 10;
-- Find products similar by image
SELECT name FROM products
ORDER BY image_embedding <=> $image_query_vec
LIMIT 10;

Tips and Troubleshooting

  • Dimension mismatch: Query vectors must have the same number of dimensions as the column. A VECTOR(384) column requires query vectors with exactly 384 elements.

  • Normalize for cosine: The <=> operator computes cosine distance. For best results, normalize your embeddings to unit length before inserting.

  • Index build time: Building an HNSW index scans all existing rows. For large tables, this can take time. The index is used immediately after creation.

  • Approximate results: HNSW provides approximate nearest neighbors. Increase ef_search for better recall at the cost of higher latency.

  • Index size: HNSW indexes store graph edges in addition to vectors. Expect the index to use roughly 1.5-2x the memory of the raw vector data with default m = 16.

  • Configuration defaults: Global HNSW defaults can be set in heliosdb.toml under [vector]:

[vector]
default_index_type = "hnsw"
hnsw_ef_construction = 200
hnsw_m = 16
enable_pq = true
pq_subvectors = 8
pq_bits = 8