HeliosDB Vector Search - Production Guide
HeliosDB Vector Search - Production Guide
Overview
HeliosDB Vector Search provides production-ready hybrid vector search capabilities combining:
- HNSW (Hierarchical Navigable Small World) - Fast approximate nearest neighbor search
- Multiple Distance Metrics - L2, Cosine, Manhattan, Dot Product, Hamming
- SIMD Optimizations - AVX2 and AVX-512 acceleration (5-10x speedup)
- Hybrid Search - Combine vector similarity with metadata filtering and full-text search
- BM25 Scoring - Industry-standard text ranking algorithm
- Multi-Vector Search - Query with multiple embeddings
- Concurrent Queries - Thread-safe read access for high QPS
Performance Targets
10,000+ QPS for 1M vectors (HNSW with ef=50) 95%+ Recall@10 with default parameters <50ms p95 latency for hybrid search 5-10x SIMD speedup for distance calculations
Quick Start
Basic Vector Search
use heliosdb_vector::{HnswIndex, DistanceMetric, VectorData};use bytes::Bytes;
// Create HNSW indexlet mut index = HnswIndex::new( 16, // M: max connections per node 200, // ef_construction: build quality DistanceMetric::L2 // distance metric);
// Insert vectorslet vector = VectorData::new(128, vec![0.1; 128]);index.insert(Bytes::from("doc1"), vector)?;
// Searchlet query = VectorData::new(128, vec![0.2; 128]);let results = index.search( &query, // query vector 10, // k: number of results Some(50), // ef: search quality (higher = better recall) None // filter: optional)?;
for (key, distance) in results { println!("Found: {:?} at distance {}", key, distance);}Hybrid Search (Vector + Metadata + Text)
use heliosdb_vector::{ HybridSearchEngine, HybridQuery, FilterOp, TextQuery, FusionStrategy, VectorStorage, StorageConfig};use std::sync::Arc;
// Create storage and indexlet storage = Arc::new(VectorStorage::new(StorageConfig::default())?);let index = Arc::new(HnswIndex::new(16, 200, DistanceMetric::Cosine));let engine = HybridSearchEngine::new(index, storage);
// Add text contentengine.add_text(1, "Machine learning and neural networks".to_string());
// Hybrid querylet query = HybridQuery::new(10) .with_vector(vec![0.1; 128]) .with_text(TextQuery::new("machine learning").with_bm25(true)) .with_filter(FilterOp::Equals("category".to_string(), "AI".to_string())) .with_fusion(FusionStrategy::Weighted { vector_weight: 0.7, text_weight: 0.2, metadata_weight: 0.1, }) .with_rerank(true);
let results = engine.search(&query)?;
for result in results { println!("ID: {}, Score: {:.4}", result.id, result.score);}Distance Metrics
Choosing the Right Metric
| Metric | Use Case | Range | Normalized? |
|---|---|---|---|
| Cosine | Text embeddings, semantic search | [0, 2] | Yes (recommended) |
| L2 (Euclidean) | Image embeddings, general purpose | [0, ∞) | No |
| Manhattan (L1) | Sparse vectors, high dimensions | [0, ∞) | No |
| Dot Product | Already normalized vectors | (-∞, ∞) | No |
| Hamming | Binary vectors, hashing | [0, n] | No |
SIMD Performance
All distance metrics are SIMD-optimized:
use heliosdb_vector::{euclidean_distance, cosine_distance};
let a = vec![0.1; 512];let b = vec![0.2; 512];
// Automatically uses AVX-512 if available, else AVX2, else scalarlet dist = euclidean_distance(&a, &b); // ~5-10x faster with SIMDBenchmark Results (512-dimensional vectors):
- Scalar: ~500ns per distance calculation
- AVX2: ~80ns per distance calculation (6x speedup)
- AVX-512: ~45ns per distance calculation (11x speedup)
Normalizing Vectors
For cosine similarity, normalize vectors first:
use heliosdb_vector::normalize;
let mut vector = vec![1.0, 2.0, 3.0];normalize(&mut vector); // Now unit lengthHNSW Parameter Tuning
Key Parameters
-
M (max connections): Controls graph connectivity
- Higher M = better recall, more memory
- Recommended: 16-32
- Range: 4-64
-
ef_construction: Build-time quality
- Higher ef = better graph, slower build
- Recommended: 200-400
- Range: 100-1000
-
ef (search-time): Query recall vs speed
- Higher ef = better recall, slower search
- Recommended: 50-200 for 95%+ recall
- Range: 10-1000
Performance Profiles
High Recall (Production)
let index = HnswIndex::new(32, 400, DistanceMetric::L2);let results = index.search(&query, k, Some(200), None)?;// Expected: >98% recall, ~5,000 QPSBalanced (Recommended)
let index = HnswIndex::new(16, 200, DistanceMetric::L2);let results = index.search(&query, k, Some(50), None)?;// Expected: >95% recall, ~10,000 QPSFast (Low Latency)
let index = HnswIndex::new(8, 100, DistanceMetric::L2);let results = index.search(&query, k, Some(20), None)?;// Expected: >85% recall, ~25,000 QPSMemory Usage
Formula: Memory ≈ N × (D × 4 + M × 8) bytes
Where:
- N = number of vectors
- D = vector dimension
- M = max connections parameter
Examples:
- 1M vectors, 128D, M=16: ~1.6 GB
- 10M vectors, 512D, M=16: ~24 GB
- 100M vectors, 768D, M=32: ~360 GB
Hybrid Search Strategies
1. Pre-filtering (Efficient)
Apply metadata filters before vector search:
let query = HybridQuery::new(10) .with_vector(embedding) .with_filter(FilterOp::And(vec![ FilterOp::Equals("category".to_string(), "product".to_string()), FilterOp::GreaterThan("price".to_string(), "100".to_string()), ]));
query.pre_filter = true; // Default: filter before searchWhen to use: High selectivity filters (>10% match rate)
2. Post-filtering (Accurate)
Apply filters after vector search:
query.pre_filter = false; // Filter after searchWhen to use: Low selectivity filters (<10% match rate)
3. Score Fusion
Combine vector similarity with text relevance:
// Weighted averagelet fusion = FusionStrategy::Weighted { vector_weight: 0.7, // Emphasize vector similarity text_weight: 0.2, // Some text relevance metadata_weight: 0.1, // Minimal metadata boost};
// Reciprocal Rank Fusion (better for combining different score ranges)let fusion = FusionStrategy::RRF { k: 60.0 };4. Reranking
Two-stage retrieval for better accuracy:
let query = HybridQuery::new(10) .with_vector(embedding) .with_rerank(true); // Fetch 2x candidates, rerank with exact similarityBM25 Text Scoring
HeliosDB implements the BM25 algorithm for text relevance:
let text_query = TextQuery::new("machine learning neural networks") .with_bm25(true) // Enable BM25 scoring .with_required(vec!["deep".to_string()]) .with_excluded(vec!["shallow".to_string()]);
let score = text_query.bm25_score( document_text, 100.0, // average document length 10000 // total documents in corpus);BM25 Formula:
BM25(D, Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D|/avgdl))Where:
- k1 = 1.5 (term frequency saturation)
- b = 0.75 (length normalization)
- IDF = inverse document frequency
- f(qi,D) = term frequency in document
Concurrent Queries
HNSW index supports concurrent reads:
use std::sync::Arc;use std::thread;
let index = Arc::new(index);
let mut handles = vec![];for _ in 0..8 { let index_clone = Arc::clone(&index); let handle = thread::spawn(move || { let results = index_clone.search(&query, 10, Some(50), None).unwrap(); // Process results }); handles.push(handle);}
for handle in handles { handle.join().unwrap();}Performance: Linear scaling up to CPU core count
Persistence
Save Index
index.save_to_file("/path/to/index.json")?;Load Index
let index = HnswIndex::load_from_file("/path/to/index.json")?;Format: JSON (human-readable, ~2x larger than binary) Note: For production, consider binary format with memory-mapped storage
Index Statistics
let stats = index.statistics()?;println!("{}", stats);
// Output:// === HNSW Index Statistics ===// Nodes: 1000000// Layers: 6 (max layer: 5)// Total edges: 32000000// Layer 0 degree - avg: 32.00, min: 16, max: 32// Memory usage: 1638.40 MB// Parameters: M=16, M_max_0=32, ef_construction=200// Distance metric: L2// Layer distribution:// Layer 0: 1000000 nodes (100.0%)// Layer 1: 62500 nodes (6.3%)// Layer 2: 3906 nodes (0.4%)// ...Filtered Search
Combine vector search with metadata filters:
// Create filter setlet filter: HashSet<usize> = allowed_node_ids.iter().copied().collect();
// Search with filterlet results = index.search(&query, k, Some(50), Some(&filter))?;Performance Impact:
- 10% selectivity: ~2x slower
- 50% selectivity: ~1.3x slower
- 90% selectivity: ~1.1x slower
Multi-Vector Search
Query with multiple embeddings (e.g., multiple text chunks):
let query = HybridQuery::new(10) .with_multi_vectors(vec![ embedding1, // First chunk embedding2, // Second chunk embedding3, // Third chunk ]);
// Returns documents that match ANY of the vectors (max score)let results = engine.search(&query)?;Production Checklist
Index Configuration
- M = 16-32 for balanced performance
- ef_construction = 200-400 for good graph quality
- ef = 50-200 for 95%+ recall at query time
Distance Metric
- Cosine for normalized embeddings (text)
- L2 for non-normalized embeddings (images)
- Normalize vectors before indexing (if using Cosine)
Hybrid Search
- Use pre-filtering for high selectivity (>10%)
- Use post-filtering for low selectivity (<10%)
- Enable BM25 for text relevance
- Tune fusion weights based on use case
Performance
- Benchmark with production data
- Test concurrent query load
- Monitor memory usage (scale with dataset)
- Enable SIMD (AVX2/AVX-512)
Reliability
- Implement index persistence
- Plan for index rebuild strategy
- Monitor recall metrics
- Set up alerting for QPS/latency
Troubleshooting
Low Recall
Problem: Search results missing relevant documents
Solutions:
- Increase
efparameter (50 → 100 → 200) - Increase
ef_constructionfor new index (200 → 400) - Increase
Mparameter (16 → 32) - Verify vector normalization (for Cosine)
- Check for filtering issues (too restrictive)
Low QPS
Problem: Slow query performance
Solutions:
- Decrease
efparameter (200 → 100 → 50) - Decrease
Mparameter (32 → 16 → 8) - Enable CPU SIMD features (AVX2/AVX-512)
- Use pre-filtering instead of post-filtering
- Reduce reranking overhead
- Scale horizontally (distribute index)
High Memory Usage
Problem: Index consuming too much RAM
Solutions:
- Decrease
Mparameter (32 → 16 → 8) - Use lower precision vectors (f32 → f16)
- Implement disk-based storage (mmap)
- Shard index across multiple nodes
- Use IVF-PQ quantization for compression
Filtering Issues
Problem: Filtered search returning too few results
Solutions:
- Use post-filtering instead of pre-filtering
- Increase
kto oversample before filtering - Check filter logic for correctness
- Implement 2-hop traversal (already supported)
- Verify metadata is correctly indexed
Advanced Topics
Custom Distance Functions
impl DistanceMetric { pub fn custom_distance(&self, a: &[f32], b: &[f32]) -> f32 { // Implement custom distance logic // Must satisfy metric properties: // 1. d(x,y) >= 0 // 2. d(x,y) = 0 iff x = y // 3. d(x,y) = d(y,x) // 4. d(x,z) <= d(x,y) + d(y,z) }}Distributed Indexing
For billion-scale datasets, shard the index:
use heliosdb_vector::{DistributedCoordinator, ShardingStrategy};
let coordinator = DistributedCoordinator::new( 8, // num_shards ShardingStrategy::Hash,)?;
// Insert will automatically route to correct shardcoordinator.insert(key, vector)?;
// Search will query all shards and merge resultslet results = coordinator.search(&query, k)?;Benchmarks
Run comprehensive benchmarks:
# All benchmarkscargo bench --bench vector_benchmarks
# Specific benchmarkscargo bench --bench vector_benchmarks -- distance_metricscargo bench --bench vector_benchmarks -- hnsw_querycargo bench --bench vector_benchmarks -- concurrent_queries
# With nightly featurescargo +nightly benchExpected results on modern CPU (3.5 GHz):
- Distance calculation: 45ns (AVX-512), 80ns (AVX2), 500ns (scalar)
- HNSW build: ~1,000 inserts/sec for 100k vectors
- HNSW query: 10,000+ QPS with ef=50, 100k vectors
- Concurrent: Linear scaling to 8+ threads
Testing
Run test suite:
# All testscargo test --package heliosdb-vector
# Specific test filecargo test --package heliosdb-vector --test vector_search_tests
# With outputcargo test -- --nocapture
# Ignored (slow) testscargo test -- --ignoredReferences
- HNSW Paper - Malkov & Yashunin (2018)
- BM25 Algorithm
- SIMD Guide
Support
For issues, questions, or feature requests:
- GitHub Issues: HeliosDB Issues
- Documentation: docs.heliosdb.com
- Community: Discord