Skip to content

Vector Search in HeliosDB Nano

Vector Search in HeliosDB Nano

HeliosDB Nano provides native vector similarity search capabilities compatible with pgvector, enabling AI and machine learning applications to store and query vector embeddings efficiently.

Features

  • VECTOR Data Type: Native support for fixed-dimension vector columns
  • HNSW Index: Hierarchical Navigable Small World (HNSW) algorithm for fast approximate nearest neighbor search
  • Multiple Distance Metrics: L2 (Euclidean), Cosine, and Inner Product
  • pgvector Compatible: Drop-in replacement for pgvector with the same operators and syntax
  • High Performance: Optimized for large-scale vector search with millions of vectors

Quick Start

Creating a Table with Vectors

CREATE TABLE documents (
id INT PRIMARY KEY,
content TEXT,
embedding VECTOR(1536) -- 1536-dimensional vector
);

Inserting Vector Data

-- Insert using array literal syntax
INSERT INTO documents (id, content, embedding)
VALUES (
1,
'Sample document',
'[0.1, 0.2, 0.3, ...]' -- 1536 values
);

Creating a Vector Index

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents USING hnsw (embedding);
-- Find 10 most similar documents using L2 distance
SELECT id, content, embedding <-> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'
LIMIT 10;

Distance Operators

HeliosDB Nano supports three distance operators, compatible with pgvector:

OperatorDescriptionUse Case
<->L2 distance (Euclidean)General-purpose similarity
<=>Cosine distanceText embeddings, normalized vectors
<#>Inner productRecommendation systems

L2 Distance (<->)

Euclidean distance between vectors. Best for general-purpose similarity search.

SELECT * FROM items
ORDER BY embedding <-> '[1.0, 2.0, 3.0]'
LIMIT 5;

Cosine Distance (<=>)

Measures angular similarity (1 - cosine similarity). Ideal for text embeddings.

SELECT * FROM documents
ORDER BY embedding <=> '[0.5, 0.3, 0.8]'
LIMIT 10;

Inner Product (<#>)

Negative dot product. Useful for recommendation systems.

SELECT * FROM products
ORDER BY embedding <#> '[1.0, 0.0, 1.0]'
LIMIT 20;

HNSW Index

What is HNSW?

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It provides:

  • Fast queries: Sub-millisecond search on millions of vectors
  • Good accuracy: >95% recall for most use cases
  • Scalability: Handles millions to billions of vectors

Creating an Index

CREATE INDEX embedding_idx ON documents USING hnsw (embedding);

Index Parameters

The HNSW index uses default parameters optimized for most use cases:

  • M = 16: Maximum number of connections per layer
  • ef_construction = 200: Search depth during index construction
  • Distance metric: Inferred from query operators

Common Use Cases

-- Store document embeddings
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(384) -- all-MiniLM-L6-v2 embeddings
);
-- Create index
CREATE INDEX ON articles USING hnsw (embedding);
-- Search for similar articles
SELECT title, embedding <=> $query_vector AS similarity
FROM articles
ORDER BY embedding <=> $query_vector
LIMIT 10;
CREATE TABLE images (
id SERIAL PRIMARY KEY,
filename TEXT,
features VECTOR(2048) -- ResNet features
);
-- Find visually similar images
SELECT filename FROM images
ORDER BY features <-> $image_vector
LIMIT 20;

3. Recommendation Systems

CREATE TABLE user_preferences (
user_id INT PRIMARY KEY,
preference_vector VECTOR(128)
);
CREATE TABLE items (
item_id INT PRIMARY KEY,
name TEXT,
item_vector VECTOR(128)
);
-- Find items matching user preferences
SELECT i.name
FROM items i, user_preferences u
WHERE u.user_id = $current_user
ORDER BY i.item_vector <#> u.preference_vector
LIMIT 50;
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
text_embedding VECTOR(768), -- Text embedding
image_embedding VECTOR(512) -- Image embedding
);
-- Search using both modalities
SELECT name,
text_embedding <=> $text_query AS text_score,
image_embedding <-> $image_query AS image_score
FROM products
WHERE (text_embedding <=> $text_query) < 0.5
OR (image_embedding <-> $image_query) < 100
ORDER BY (text_embedding <=> $text_query) +
(image_embedding <-> $image_query)
LIMIT 10;

Performance Tuning

Index Build Time

  • Small datasets (<10K vectors): Instant
  • Medium datasets (10K-100K vectors): Seconds
  • Large datasets (100K-1M vectors): Minutes
  • Very large datasets (>1M vectors): Consider batch insertion

Query Performance

Typical query times on commodity hardware:

Dataset SizeVectors per SecondLatency (p50)Latency (p99)
10K50,000+<1ms<2ms
100K20,000+<2ms<5ms
1M10,000+<5ms<10ms
10M5,000+<10ms<20ms

Memory Usage

  • Base overhead: ~50 bytes per vector
  • M=16, dim=384: ~6KB per 1000 vectors
  • Expected: 60-80 bytes per vector total

Best Practices

1. Choose the Right Dimension

-- Common embedding dimensions:
-- 384: all-MiniLM-L6-v2 (text)
-- 768: BERT-base (text)
-- 1536: text-embedding-ada-002 (OpenAI)
-- 2048: ResNet-50 (images)
CREATE TABLE embeddings (
id INT PRIMARY KEY,
vector VECTOR(384) -- Match your embedding model
);

2. Normalize Vectors for Cosine Distance

-- For cosine distance, normalize vectors before insertion
-- Most embedding models already produce normalized vectors

3. Use LIMIT for Better Performance

-- Always use LIMIT for k-NN queries
SELECT * FROM items
ORDER BY embedding <-> $query
LIMIT 100; -- Don't retrieve all results

4. Combine with Filters

-- Use WHERE clauses to pre-filter before vector search
SELECT * FROM products
WHERE category = 'electronics'
AND price < 1000
ORDER BY embedding <-> $query
LIMIT 20;

5. Batch Insertions

-- Insert multiple vectors in one transaction for better performance
BEGIN;
INSERT INTO vectors (id, vec) VALUES
(1, '[...]'),
(2, '[...]'),
...
(1000, '[...]');
COMMIT;

Limitations and Considerations

  1. Approximate Results: HNSW provides approximate nearest neighbors, not exact results
  2. No Updates: Vector values should be immutable; updates require re-indexing
  3. Memory Intensive: Large indexes require significant RAM
  4. Build Time: Initial index creation can be slow for large datasets

Migration from pgvector

HeliosDB Nano is designed to be a drop-in replacement for pgvector:

-- pgvector syntax works as-is:
CREATE TABLE items (embedding vector(3));
INSERT INTO items VALUES ('[1,2,3]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]';
-- HeliosDB Nano equivalent (same syntax):
CREATE TABLE items (embedding VECTOR(3));
INSERT INTO items VALUES ('[1,2,3]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]';

Benchmarks

Run benchmarks to test performance on your hardware:

Terminal window
cargo bench --bench vector_search_bench

This will test:

  • Insert performance (various dataset sizes)
  • Query performance (1K, 10K, 100K vectors)
  • Distance computation speed
  • k-NN accuracy

References

Examples

See the following files for complete examples:

  • /tests/vector_search_test.rs - Integration tests
  • /examples/vector_search_demo.rs - Complete demo application
  • /benches/vector_search_bench.rs - Performance benchmarks