Vector Search in HeliosDB Nano
Vector Search in HeliosDB Nano
HeliosDB Nano provides native vector similarity search capabilities compatible with pgvector, enabling AI and machine learning applications to store and query vector embeddings efficiently.
Features
- VECTOR Data Type: Native support for fixed-dimension vector columns
- HNSW Index: Hierarchical Navigable Small World (HNSW) algorithm for fast approximate nearest neighbor search
- Multiple Distance Metrics: L2 (Euclidean), Cosine, and Inner Product
- pgvector Compatible: Drop-in replacement for pgvector with the same operators and syntax
- High Performance: Optimized for large-scale vector search with millions of vectors
Quick Start
Creating a Table with Vectors
CREATE TABLE documents ( id INT PRIMARY KEY, content TEXT, embedding VECTOR(1536) -- 1536-dimensional vector);Inserting Vector Data
-- Insert using array literal syntaxINSERT INTO documents (id, content, embedding)VALUES ( 1, 'Sample document', '[0.1, 0.2, 0.3, ...]' -- 1536 values);Creating a Vector Index
-- Create HNSW index for fast similarity searchCREATE INDEX ON documents USING hnsw (embedding);Querying with Similarity Search
-- Find 10 most similar documents using L2 distanceSELECT id, content, embedding <-> '[0.1, 0.2, ...]' AS distanceFROM documentsORDER BY embedding <-> '[0.1, 0.2, ...]'LIMIT 10;Distance Operators
HeliosDB Nano supports three distance operators, compatible with pgvector:
| Operator | Description | Use Case |
|---|---|---|
<-> | L2 distance (Euclidean) | General-purpose similarity |
<=> | Cosine distance | Text embeddings, normalized vectors |
<#> | Inner product | Recommendation systems |
L2 Distance (<->)
Euclidean distance between vectors. Best for general-purpose similarity search.
SELECT * FROM itemsORDER BY embedding <-> '[1.0, 2.0, 3.0]'LIMIT 5;Cosine Distance (<=>)
Measures angular similarity (1 - cosine similarity). Ideal for text embeddings.
SELECT * FROM documentsORDER BY embedding <=> '[0.5, 0.3, 0.8]'LIMIT 10;Inner Product (<#>)
Negative dot product. Useful for recommendation systems.
SELECT * FROM productsORDER BY embedding <#> '[1.0, 0.0, 1.0]'LIMIT 20;HNSW Index
What is HNSW?
HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It provides:
- Fast queries: Sub-millisecond search on millions of vectors
- Good accuracy: >95% recall for most use cases
- Scalability: Handles millions to billions of vectors
Creating an Index
CREATE INDEX embedding_idx ON documents USING hnsw (embedding);Index Parameters
The HNSW index uses default parameters optimized for most use cases:
- M = 16: Maximum number of connections per layer
- ef_construction = 200: Search depth during index construction
- Distance metric: Inferred from query operators
Common Use Cases
1. Semantic Text Search
-- Store document embeddingsCREATE TABLE articles ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, embedding VECTOR(384) -- all-MiniLM-L6-v2 embeddings);
-- Create indexCREATE INDEX ON articles USING hnsw (embedding);
-- Search for similar articlesSELECT title, embedding <=> $query_vector AS similarityFROM articlesORDER BY embedding <=> $query_vectorLIMIT 10;2. Image Similarity Search
CREATE TABLE images ( id SERIAL PRIMARY KEY, filename TEXT, features VECTOR(2048) -- ResNet features);
-- Find visually similar imagesSELECT filename FROM imagesORDER BY features <-> $image_vectorLIMIT 20;3. Recommendation Systems
CREATE TABLE user_preferences ( user_id INT PRIMARY KEY, preference_vector VECTOR(128));
CREATE TABLE items ( item_id INT PRIMARY KEY, name TEXT, item_vector VECTOR(128));
-- Find items matching user preferencesSELECT i.nameFROM items i, user_preferences uWHERE u.user_id = $current_userORDER BY i.item_vector <#> u.preference_vectorLIMIT 50;4. Multi-Modal Search
CREATE TABLE products ( id SERIAL PRIMARY KEY, name TEXT, description TEXT, text_embedding VECTOR(768), -- Text embedding image_embedding VECTOR(512) -- Image embedding);
-- Search using both modalitiesSELECT name, text_embedding <=> $text_query AS text_score, image_embedding <-> $image_query AS image_scoreFROM productsWHERE (text_embedding <=> $text_query) < 0.5 OR (image_embedding <-> $image_query) < 100ORDER BY (text_embedding <=> $text_query) + (image_embedding <-> $image_query)LIMIT 10;Performance Tuning
Index Build Time
- Small datasets (<10K vectors): Instant
- Medium datasets (10K-100K vectors): Seconds
- Large datasets (100K-1M vectors): Minutes
- Very large datasets (>1M vectors): Consider batch insertion
Query Performance
Typical query times on commodity hardware:
| Dataset Size | Vectors per Second | Latency (p50) | Latency (p99) |
|---|---|---|---|
| 10K | 50,000+ | <1ms | <2ms |
| 100K | 20,000+ | <2ms | <5ms |
| 1M | 10,000+ | <5ms | <10ms |
| 10M | 5,000+ | <10ms | <20ms |
Memory Usage
- Base overhead: ~50 bytes per vector
- M=16, dim=384: ~6KB per 1000 vectors
- Expected: 60-80 bytes per vector total
Best Practices
1. Choose the Right Dimension
-- Common embedding dimensions:-- 384: all-MiniLM-L6-v2 (text)-- 768: BERT-base (text)-- 1536: text-embedding-ada-002 (OpenAI)-- 2048: ResNet-50 (images)
CREATE TABLE embeddings ( id INT PRIMARY KEY, vector VECTOR(384) -- Match your embedding model);2. Normalize Vectors for Cosine Distance
-- For cosine distance, normalize vectors before insertion-- Most embedding models already produce normalized vectors3. Use LIMIT for Better Performance
-- Always use LIMIT for k-NN queriesSELECT * FROM itemsORDER BY embedding <-> $queryLIMIT 100; -- Don't retrieve all results4. Combine with Filters
-- Use WHERE clauses to pre-filter before vector searchSELECT * FROM productsWHERE category = 'electronics' AND price < 1000ORDER BY embedding <-> $queryLIMIT 20;5. Batch Insertions
-- Insert multiple vectors in one transaction for better performanceBEGIN;INSERT INTO vectors (id, vec) VALUES (1, '[...]'), (2, '[...]'), ... (1000, '[...]');COMMIT;Limitations and Considerations
- Approximate Results: HNSW provides approximate nearest neighbors, not exact results
- No Updates: Vector values should be immutable; updates require re-indexing
- Memory Intensive: Large indexes require significant RAM
- Build Time: Initial index creation can be slow for large datasets
Migration from pgvector
HeliosDB Nano is designed to be a drop-in replacement for pgvector:
-- pgvector syntax works as-is:CREATE TABLE items (embedding vector(3));INSERT INTO items VALUES ('[1,2,3]');SELECT * FROM items ORDER BY embedding <-> '[3,1,2]';
-- HeliosDB Nano equivalent (same syntax):CREATE TABLE items (embedding VECTOR(3));INSERT INTO items VALUES ('[1,2,3]');SELECT * FROM items ORDER BY embedding <-> '[3,1,2]';Benchmarks
Run benchmarks to test performance on your hardware:
cargo bench --bench vector_search_benchThis will test:
- Insert performance (various dataset sizes)
- Query performance (1K, 10K, 100K vectors)
- Distance computation speed
- k-NN accuracy
References
Examples
See the following files for complete examples:
/tests/vector_search_test.rs- Integration tests/examples/vector_search_demo.rs- Complete demo application/benches/vector_search_bench.rs- Performance benchmarks