HeliosDB Vector Search - Production Guide

Overview

HeliosDB Vector Search provides production-ready hybrid vector search capabilities combining:

HNSW (Hierarchical Navigable Small World) - Fast approximate nearest neighbor search
Multiple Distance Metrics - L2, Cosine, Manhattan, Dot Product, Hamming
SIMD Optimizations - AVX2 and AVX-512 acceleration (5-10x speedup)
Hybrid Search - Combine vector similarity with metadata filtering and full-text search
BM25 Scoring - Industry-standard text ranking algorithm
Multi-Vector Search - Query with multiple embeddings
Concurrent Queries - Thread-safe read access for high QPS

Performance Targets

10,000+ QPS for 1M vectors (HNSW with ef=50) 95%+ Recall@10 with default parameters <50ms p95 latency for hybrid search 5-10x SIMD speedup for distance calculations

Quick Start

Basic Vector Search

use heliosdb_vector::{HnswIndex, DistanceMetric, VectorData};
use bytes::Bytes;

// Create HNSW index
let mut index = HnswIndex::new(
    16,                     // M: max connections per node
    200,                    // ef_construction: build quality
    DistanceMetric::L2      // distance metric
);

// Insert vectors
let vector = VectorData::new(128, vec![0.1; 128]);
index.insert(Bytes::from("doc1"), vector)?;

// Search
let query = VectorData::new(128, vec![0.2; 128]);
let results = index.search(
    &query,    // query vector
    10,        // k: number of results
    Some(50),  // ef: search quality (higher = better recall)
    None       // filter: optional
)?;

for (key, distance) in results {
    println!("Found: {:?} at distance {}", key, distance);
}

Hybrid Search (Vector + Metadata + Text)

use heliosdb_vector::{
    HybridSearchEngine, HybridQuery, FilterOp, TextQuery, FusionStrategy,
    VectorStorage, StorageConfig
};
use std::sync::Arc;

// Create storage and index
let storage = Arc::new(VectorStorage::new(StorageConfig::default())?);
let index = Arc::new(HnswIndex::new(16, 200, DistanceMetric::Cosine));
let engine = HybridSearchEngine::new(index, storage);

// Add text content
engine.add_text(1, "Machine learning and neural networks".to_string());

// Hybrid query
let query = HybridQuery::new(10)
    .with_vector(vec![0.1; 128])
    .with_text(TextQuery::new("machine learning").with_bm25(true))
    .with_filter(FilterOp::Equals("category".to_string(), "AI".to_string()))
    .with_fusion(FusionStrategy::Weighted {
        vector_weight: 0.7,
        text_weight: 0.2,
        metadata_weight: 0.1,
    })
    .with_rerank(true);

let results = engine.search(&query)?;

for result in results {
    println!("ID: {}, Score: {:.4}", result.id, result.score);
}

Distance Metrics

Choosing the Right Metric

Metric	Use Case	Range	Normalized?
Cosine	Text embeddings, semantic search	[0, 2]	Yes (recommended)
L2 (Euclidean)	Image embeddings, general purpose	[0, ∞)	No
Manhattan (L1)	Sparse vectors, high dimensions	[0, ∞)	No
Dot Product	Already normalized vectors	(-∞, ∞)	No
Hamming	Binary vectors, hashing	[0, n]	No

SIMD Performance

All distance metrics are SIMD-optimized:

use heliosdb_vector::{euclidean_distance, cosine_distance};

let a = vec![0.1; 512];
let b = vec![0.2; 512];

// Automatically uses AVX-512 if available, else AVX2, else scalar
let dist = euclidean_distance(&a, &b);  // ~5-10x faster with SIMD

Benchmark Results (512-dimensional vectors):

Scalar: ~500ns per distance calculation
AVX2: ~80ns per distance calculation (6x speedup)
AVX-512: ~45ns per distance calculation (11x speedup)

Normalizing Vectors

For cosine similarity, normalize vectors first:

use heliosdb_vector::normalize;

let mut vector = vec![1.0, 2.0, 3.0];
normalize(&mut vector);  // Now unit length

HNSW Parameter Tuning

Key Parameters

M (max connections): Controls graph connectivity
- Higher M = better recall, more memory
- Recommended: 16-32
- Range: 4-64
ef_construction: Build-time quality
- Higher ef = better graph, slower build
- Recommended: 200-400
- Range: 100-1000
ef (search-time): Query recall vs speed
- Higher ef = better recall, slower search
- Recommended: 50-200 for 95%+ recall
- Range: 10-1000

Performance Profiles

High Recall (Production)

let index = HnswIndex::new(32, 400, DistanceMetric::L2);
let results = index.search(&query, k, Some(200), None)?;
// Expected: >98% recall, ~5,000 QPS

Balanced (Recommended)

let index = HnswIndex::new(16, 200, DistanceMetric::L2);
let results = index.search(&query, k, Some(50), None)?;
// Expected: >95% recall, ~10,000 QPS

Fast (Low Latency)

let index = HnswIndex::new(8, 100, DistanceMetric::L2);
let results = index.search(&query, k, Some(20), None)?;
// Expected: >85% recall, ~25,000 QPS

Memory Usage

Formula: Memory ≈ N × (D × 4 + M × 8) bytes

Where:

N = number of vectors
D = vector dimension
M = max connections parameter

Examples:

1M vectors, 128D, M=16: ~1.6 GB
10M vectors, 512D, M=16: ~24 GB
100M vectors, 768D, M=32: ~360 GB

Hybrid Search Strategies

1. Pre-filtering (Efficient)

Apply metadata filters before vector search:

let query = HybridQuery::new(10)
    .with_vector(embedding)
    .with_filter(FilterOp::And(vec![
        FilterOp::Equals("category".to_string(), "product".to_string()),
        FilterOp::GreaterThan("price".to_string(), "100".to_string()),
    ]));

query.pre_filter = true;  // Default: filter before search

When to use: High selectivity filters (>10% match rate)

2. Post-filtering (Accurate)

Apply filters after vector search:

query.pre_filter = false;  // Filter after search

When to use: Low selectivity filters (<10% match rate)

3. Score Fusion

Combine vector similarity with text relevance:

// Weighted average
let fusion = FusionStrategy::Weighted {
    vector_weight: 0.7,   // Emphasize vector similarity
    text_weight: 0.2,     // Some text relevance
    metadata_weight: 0.1, // Minimal metadata boost
};

// Reciprocal Rank Fusion (better for combining different score ranges)
let fusion = FusionStrategy::RRF { k: 60.0 };

4. Reranking

Two-stage retrieval for better accuracy:

let query = HybridQuery::new(10)
    .with_vector(embedding)
    .with_rerank(true);  // Fetch 2x candidates, rerank with exact similarity

BM25 Text Scoring

HeliosDB implements the BM25 algorithm for text relevance:

let text_query = TextQuery::new("machine learning neural networks")
    .with_bm25(true)  // Enable BM25 scoring
    .with_required(vec!["deep".to_string()])
    .with_excluded(vec!["shallow".to_string()]);

let score = text_query.bm25_score(
    document_text,
    100.0,  // average document length
    10000   // total documents in corpus
);

BM25 Formula:

BM25(D, Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D|/avgdl))

Where:

k1 = 1.5 (term frequency saturation)
b = 0.75 (length normalization)
IDF = inverse document frequency
f(qi,D) = term frequency in document

Concurrent Queries

HNSW index supports concurrent reads:

use std::sync::Arc;
use std::thread;

let index = Arc::new(index);

let mut handles = vec![];
for _ in 0..8 {
    let index_clone = Arc::clone(&index);
    let handle = thread::spawn(move || {
        let results = index_clone.search(&query, 10, Some(50), None).unwrap();
        // Process results
    });
    handles.push(handle);
}

for handle in handles {
    handle.join().unwrap();
}

Performance: Linear scaling up to CPU core count

Persistence

Save Index

index.save_to_file("/path/to/index.json")?;

Load Index

let index = HnswIndex::load_from_file("/path/to/index.json")?;

Format: JSON (human-readable, ~2x larger than binary) Note: For production, consider binary format with memory-mapped storage

Index Statistics

let stats = index.statistics()?;
println!("{}", stats);

// Output:
// === HNSW Index Statistics ===
// Nodes: 1000000
// Layers: 6 (max layer: 5)
// Total edges: 32000000
// Layer 0 degree - avg: 32.00, min: 16, max: 32
// Memory usage: 1638.40 MB
// Parameters: M=16, M_max_0=32, ef_construction=200
// Distance metric: L2
// Layer distribution:
//   Layer 0: 1000000 nodes (100.0%)
//   Layer 1: 62500 nodes (6.3%)
//   Layer 2: 3906 nodes (0.4%)
//   ...

Filtered Search

Combine vector search with metadata filters:

// Create filter set
let filter: HashSet<usize> = allowed_node_ids.iter().copied().collect();

// Search with filter
let results = index.search(&query, k, Some(50), Some(&filter))?;

Performance Impact:

10% selectivity: ~2x slower
50% selectivity: ~1.3x slower
90% selectivity: ~1.1x slower

Multi-Vector Search

Query with multiple embeddings (e.g., multiple text chunks):

let query = HybridQuery::new(10)
    .with_multi_vectors(vec![
        embedding1,  // First chunk
        embedding2,  // Second chunk
        embedding3,  // Third chunk
    ]);

// Returns documents that match ANY of the vectors (max score)
let results = engine.search(&query)?;

Production Checklist

Index Configuration

M = 16-32 for balanced performance
ef_construction = 200-400 for good graph quality
ef = 50-200 for 95%+ recall at query time

Distance Metric

Cosine for normalized embeddings (text)
L2 for non-normalized embeddings (images)
Normalize vectors before indexing (if using Cosine)

Hybrid Search

Use pre-filtering for high selectivity (>10%)
Use post-filtering for low selectivity (<10%)
Enable BM25 for text relevance
Tune fusion weights based on use case

Performance

Benchmark with production data
Test concurrent query load
Monitor memory usage (scale with dataset)
Enable SIMD (AVX2/AVX-512)

Reliability

Implement index persistence
Plan for index rebuild strategy
Monitor recall metrics
Set up alerting for QPS/latency

Troubleshooting

Low Recall

Problem: Search results missing relevant documents

Solutions:

Increase ef parameter (50 → 100 → 200)
Increase ef_construction for new index (200 → 400)
Increase M parameter (16 → 32)
Verify vector normalization (for Cosine)
Check for filtering issues (too restrictive)

Low QPS

Problem: Slow query performance

Solutions:

Decrease ef parameter (200 → 100 → 50)
Decrease M parameter (32 → 16 → 8)
Enable CPU SIMD features (AVX2/AVX-512)
Use pre-filtering instead of post-filtering
Reduce reranking overhead
Scale horizontally (distribute index)

High Memory Usage

Problem: Index consuming too much RAM

Solutions:

Decrease M parameter (32 → 16 → 8)
Use lower precision vectors (f32 → f16)
Implement disk-based storage (mmap)
Shard index across multiple nodes
Use IVF-PQ quantization for compression

Filtering Issues

Problem: Filtered search returning too few results

Solutions:

Use post-filtering instead of pre-filtering
Increase k to oversample before filtering
Check filter logic for correctness
Implement 2-hop traversal (already supported)
Verify metadata is correctly indexed

Advanced Topics

Custom Distance Functions

impl DistanceMetric {
    pub fn custom_distance(&self, a: &[f32], b: &[f32]) -> f32 {
        // Implement custom distance logic
        // Must satisfy metric properties:
        // 1. d(x,y) >= 0
        // 2. d(x,y) = 0 iff x = y
        // 3. d(x,y) = d(y,x)
        // 4. d(x,z) <= d(x,y) + d(y,z)
    }
}

Distributed Indexing

For billion-scale datasets, shard the index:

use heliosdb_vector::{DistributedCoordinator, ShardingStrategy};

let coordinator = DistributedCoordinator::new(
    8,  // num_shards
    ShardingStrategy::Hash,
)?;

// Insert will automatically route to correct shard
coordinator.insert(key, vector)?;

// Search will query all shards and merge results
let results = coordinator.search(&query, k)?;

Benchmarks

Run comprehensive benchmarks:

# All benchmarks
cargo bench --bench vector_benchmarks

# Specific benchmarks
cargo bench --bench vector_benchmarks -- distance_metrics
cargo bench --bench vector_benchmarks -- hnsw_query
cargo bench --bench vector_benchmarks -- concurrent_queries

# With nightly features
cargo +nightly bench

Expected results on modern CPU (3.5 GHz):

Distance calculation: 45ns (AVX-512), 80ns (AVX2), 500ns (scalar)
HNSW build: ~1,000 inserts/sec for 100k vectors
HNSW query: 10,000+ QPS with ef=50, 100k vectors
Concurrent: Linear scaling to 8+ threads

Testing

Run test suite:

# All tests
cargo test --package heliosdb-vector

# Specific test file
cargo test --package heliosdb-vector --test vector_search_tests

# With output
cargo test -- --nocapture

# Ignored (slow) tests
cargo test -- --ignored

References

HNSW Paper - Malkov & Yashunin (2018)
BM25 Algorithm
SIMD Guide

Support

For issues, questions, or feature requests:

GitHub Issues: HeliosDB Issues
Documentation: docs.heliosdb.com
Community: Discord