Skip to content

HeliosDB Vector Search - Production Guide

HeliosDB Vector Search - Production Guide

Overview

HeliosDB Vector Search provides production-ready hybrid vector search capabilities combining:

  • HNSW (Hierarchical Navigable Small World) - Fast approximate nearest neighbor search
  • Multiple Distance Metrics - L2, Cosine, Manhattan, Dot Product, Hamming
  • SIMD Optimizations - AVX2 and AVX-512 acceleration (5-10x speedup)
  • Hybrid Search - Combine vector similarity with metadata filtering and full-text search
  • BM25 Scoring - Industry-standard text ranking algorithm
  • Multi-Vector Search - Query with multiple embeddings
  • Concurrent Queries - Thread-safe read access for high QPS

Performance Targets

10,000+ QPS for 1M vectors (HNSW with ef=50) 95%+ Recall@10 with default parameters <50ms p95 latency for hybrid search 5-10x SIMD speedup for distance calculations

Quick Start

use heliosdb_vector::{HnswIndex, DistanceMetric, VectorData};
use bytes::Bytes;
// Create HNSW index
let mut index = HnswIndex::new(
16, // M: max connections per node
200, // ef_construction: build quality
DistanceMetric::L2 // distance metric
);
// Insert vectors
let vector = VectorData::new(128, vec![0.1; 128]);
index.insert(Bytes::from("doc1"), vector)?;
// Search
let query = VectorData::new(128, vec![0.2; 128]);
let results = index.search(
&query, // query vector
10, // k: number of results
Some(50), // ef: search quality (higher = better recall)
None // filter: optional
)?;
for (key, distance) in results {
println!("Found: {:?} at distance {}", key, distance);
}

Hybrid Search (Vector + Metadata + Text)

use heliosdb_vector::{
HybridSearchEngine, HybridQuery, FilterOp, TextQuery, FusionStrategy,
VectorStorage, StorageConfig
};
use std::sync::Arc;
// Create storage and index
let storage = Arc::new(VectorStorage::new(StorageConfig::default())?);
let index = Arc::new(HnswIndex::new(16, 200, DistanceMetric::Cosine));
let engine = HybridSearchEngine::new(index, storage);
// Add text content
engine.add_text(1, "Machine learning and neural networks".to_string());
// Hybrid query
let query = HybridQuery::new(10)
.with_vector(vec![0.1; 128])
.with_text(TextQuery::new("machine learning").with_bm25(true))
.with_filter(FilterOp::Equals("category".to_string(), "AI".to_string()))
.with_fusion(FusionStrategy::Weighted {
vector_weight: 0.7,
text_weight: 0.2,
metadata_weight: 0.1,
})
.with_rerank(true);
let results = engine.search(&query)?;
for result in results {
println!("ID: {}, Score: {:.4}", result.id, result.score);
}

Distance Metrics

Choosing the Right Metric

MetricUse CaseRangeNormalized?
CosineText embeddings, semantic search[0, 2]Yes (recommended)
L2 (Euclidean)Image embeddings, general purpose[0, ∞)No
Manhattan (L1)Sparse vectors, high dimensions[0, ∞)No
Dot ProductAlready normalized vectors(-∞, ∞)No
HammingBinary vectors, hashing[0, n]No

SIMD Performance

All distance metrics are SIMD-optimized:

use heliosdb_vector::{euclidean_distance, cosine_distance};
let a = vec![0.1; 512];
let b = vec![0.2; 512];
// Automatically uses AVX-512 if available, else AVX2, else scalar
let dist = euclidean_distance(&a, &b); // ~5-10x faster with SIMD

Benchmark Results (512-dimensional vectors):

  • Scalar: ~500ns per distance calculation
  • AVX2: ~80ns per distance calculation (6x speedup)
  • AVX-512: ~45ns per distance calculation (11x speedup)

Normalizing Vectors

For cosine similarity, normalize vectors first:

use heliosdb_vector::normalize;
let mut vector = vec![1.0, 2.0, 3.0];
normalize(&mut vector); // Now unit length

HNSW Parameter Tuning

Key Parameters

  1. M (max connections): Controls graph connectivity

    • Higher M = better recall, more memory
    • Recommended: 16-32
    • Range: 4-64
  2. ef_construction: Build-time quality

    • Higher ef = better graph, slower build
    • Recommended: 200-400
    • Range: 100-1000
  3. ef (search-time): Query recall vs speed

    • Higher ef = better recall, slower search
    • Recommended: 50-200 for 95%+ recall
    • Range: 10-1000

Performance Profiles

High Recall (Production)

let index = HnswIndex::new(32, 400, DistanceMetric::L2);
let results = index.search(&query, k, Some(200), None)?;
// Expected: >98% recall, ~5,000 QPS
let index = HnswIndex::new(16, 200, DistanceMetric::L2);
let results = index.search(&query, k, Some(50), None)?;
// Expected: >95% recall, ~10,000 QPS

Fast (Low Latency)

let index = HnswIndex::new(8, 100, DistanceMetric::L2);
let results = index.search(&query, k, Some(20), None)?;
// Expected: >85% recall, ~25,000 QPS

Memory Usage

Formula: Memory ≈ N × (D × 4 + M × 8) bytes

Where:

  • N = number of vectors
  • D = vector dimension
  • M = max connections parameter

Examples:

  • 1M vectors, 128D, M=16: ~1.6 GB
  • 10M vectors, 512D, M=16: ~24 GB
  • 100M vectors, 768D, M=32: ~360 GB

Hybrid Search Strategies

1. Pre-filtering (Efficient)

Apply metadata filters before vector search:

let query = HybridQuery::new(10)
.with_vector(embedding)
.with_filter(FilterOp::And(vec![
FilterOp::Equals("category".to_string(), "product".to_string()),
FilterOp::GreaterThan("price".to_string(), "100".to_string()),
]));
query.pre_filter = true; // Default: filter before search

When to use: High selectivity filters (>10% match rate)

2. Post-filtering (Accurate)

Apply filters after vector search:

query.pre_filter = false; // Filter after search

When to use: Low selectivity filters (<10% match rate)

3. Score Fusion

Combine vector similarity with text relevance:

// Weighted average
let fusion = FusionStrategy::Weighted {
vector_weight: 0.7, // Emphasize vector similarity
text_weight: 0.2, // Some text relevance
metadata_weight: 0.1, // Minimal metadata boost
};
// Reciprocal Rank Fusion (better for combining different score ranges)
let fusion = FusionStrategy::RRF { k: 60.0 };

4. Reranking

Two-stage retrieval for better accuracy:

let query = HybridQuery::new(10)
.with_vector(embedding)
.with_rerank(true); // Fetch 2x candidates, rerank with exact similarity

BM25 Text Scoring

HeliosDB implements the BM25 algorithm for text relevance:

let text_query = TextQuery::new("machine learning neural networks")
.with_bm25(true) // Enable BM25 scoring
.with_required(vec!["deep".to_string()])
.with_excluded(vec!["shallow".to_string()]);
let score = text_query.bm25_score(
document_text,
100.0, // average document length
10000 // total documents in corpus
);

BM25 Formula:

BM25(D, Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D|/avgdl))

Where:

  • k1 = 1.5 (term frequency saturation)
  • b = 0.75 (length normalization)
  • IDF = inverse document frequency
  • f(qi,D) = term frequency in document

Concurrent Queries

HNSW index supports concurrent reads:

use std::sync::Arc;
use std::thread;
let index = Arc::new(index);
let mut handles = vec![];
for _ in 0..8 {
let index_clone = Arc::clone(&index);
let handle = thread::spawn(move || {
let results = index_clone.search(&query, 10, Some(50), None).unwrap();
// Process results
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}

Performance: Linear scaling up to CPU core count

Persistence

Save Index

index.save_to_file("/path/to/index.json")?;

Load Index

let index = HnswIndex::load_from_file("/path/to/index.json")?;

Format: JSON (human-readable, ~2x larger than binary) Note: For production, consider binary format with memory-mapped storage

Index Statistics

let stats = index.statistics()?;
println!("{}", stats);
// Output:
// === HNSW Index Statistics ===
// Nodes: 1000000
// Layers: 6 (max layer: 5)
// Total edges: 32000000
// Layer 0 degree - avg: 32.00, min: 16, max: 32
// Memory usage: 1638.40 MB
// Parameters: M=16, M_max_0=32, ef_construction=200
// Distance metric: L2
// Layer distribution:
// Layer 0: 1000000 nodes (100.0%)
// Layer 1: 62500 nodes (6.3%)
// Layer 2: 3906 nodes (0.4%)
// ...

Combine vector search with metadata filters:

// Create filter set
let filter: HashSet<usize> = allowed_node_ids.iter().copied().collect();
// Search with filter
let results = index.search(&query, k, Some(50), Some(&filter))?;

Performance Impact:

  • 10% selectivity: ~2x slower
  • 50% selectivity: ~1.3x slower
  • 90% selectivity: ~1.1x slower

Query with multiple embeddings (e.g., multiple text chunks):

let query = HybridQuery::new(10)
.with_multi_vectors(vec![
embedding1, // First chunk
embedding2, // Second chunk
embedding3, // Third chunk
]);
// Returns documents that match ANY of the vectors (max score)
let results = engine.search(&query)?;

Production Checklist

Index Configuration

  • M = 16-32 for balanced performance
  • ef_construction = 200-400 for good graph quality
  • ef = 50-200 for 95%+ recall at query time

Distance Metric

  • Cosine for normalized embeddings (text)
  • L2 for non-normalized embeddings (images)
  • Normalize vectors before indexing (if using Cosine)

Hybrid Search

  • Use pre-filtering for high selectivity (>10%)
  • Use post-filtering for low selectivity (<10%)
  • Enable BM25 for text relevance
  • Tune fusion weights based on use case

Performance

  • Benchmark with production data
  • Test concurrent query load
  • Monitor memory usage (scale with dataset)
  • Enable SIMD (AVX2/AVX-512)

Reliability

  • Implement index persistence
  • Plan for index rebuild strategy
  • Monitor recall metrics
  • Set up alerting for QPS/latency

Troubleshooting

Low Recall

Problem: Search results missing relevant documents

Solutions:

  1. Increase ef parameter (50 → 100 → 200)
  2. Increase ef_construction for new index (200 → 400)
  3. Increase M parameter (16 → 32)
  4. Verify vector normalization (for Cosine)
  5. Check for filtering issues (too restrictive)

Low QPS

Problem: Slow query performance

Solutions:

  1. Decrease ef parameter (200 → 100 → 50)
  2. Decrease M parameter (32 → 16 → 8)
  3. Enable CPU SIMD features (AVX2/AVX-512)
  4. Use pre-filtering instead of post-filtering
  5. Reduce reranking overhead
  6. Scale horizontally (distribute index)

High Memory Usage

Problem: Index consuming too much RAM

Solutions:

  1. Decrease M parameter (32 → 16 → 8)
  2. Use lower precision vectors (f32 → f16)
  3. Implement disk-based storage (mmap)
  4. Shard index across multiple nodes
  5. Use IVF-PQ quantization for compression

Filtering Issues

Problem: Filtered search returning too few results

Solutions:

  1. Use post-filtering instead of pre-filtering
  2. Increase k to oversample before filtering
  3. Check filter logic for correctness
  4. Implement 2-hop traversal (already supported)
  5. Verify metadata is correctly indexed

Advanced Topics

Custom Distance Functions

impl DistanceMetric {
pub fn custom_distance(&self, a: &[f32], b: &[f32]) -> f32 {
// Implement custom distance logic
// Must satisfy metric properties:
// 1. d(x,y) >= 0
// 2. d(x,y) = 0 iff x = y
// 3. d(x,y) = d(y,x)
// 4. d(x,z) <= d(x,y) + d(y,z)
}
}

Distributed Indexing

For billion-scale datasets, shard the index:

use heliosdb_vector::{DistributedCoordinator, ShardingStrategy};
let coordinator = DistributedCoordinator::new(
8, // num_shards
ShardingStrategy::Hash,
)?;
// Insert will automatically route to correct shard
coordinator.insert(key, vector)?;
// Search will query all shards and merge results
let results = coordinator.search(&query, k)?;

Benchmarks

Run comprehensive benchmarks:

Terminal window
# All benchmarks
cargo bench --bench vector_benchmarks
# Specific benchmarks
cargo bench --bench vector_benchmarks -- distance_metrics
cargo bench --bench vector_benchmarks -- hnsw_query
cargo bench --bench vector_benchmarks -- concurrent_queries
# With nightly features
cargo +nightly bench

Expected results on modern CPU (3.5 GHz):

  • Distance calculation: 45ns (AVX-512), 80ns (AVX2), 500ns (scalar)
  • HNSW build: ~1,000 inserts/sec for 100k vectors
  • HNSW query: 10,000+ QPS with ef=50, 100k vectors
  • Concurrent: Linear scaling to 8+ threads

Testing

Run test suite:

Terminal window
# All tests
cargo test --package heliosdb-vector
# Specific test file
cargo test --package heliosdb-vector --test vector_search_tests
# With output
cargo test -- --nocapture
# Ignored (slow) tests
cargo test -- --ignored

References

Support

For issues, questions, or feature requests: