Skip to content

HNSW Performance Tuning Guide

HNSW Performance Tuning Guide

Overview

This guide provides detailed performance tuning strategies for achieving production-grade performance with HeliosDB vector search.

Target Metrics

  • QPS: 10,000+ queries per second (1M vectors, ef=50)
  • Recall: 95%+ at Recall@10
  • Latency: p50 < 5ms, p95 < 20ms, p99 < 50ms
  • Build Speed: 1,000+ vectors/sec
  • Memory: ~1.6 GB per 1M vectors (128D, M=16)

Parameter Optimization

M (Max Connections)

Controls graph connectivity and search quality.

MRecall@10QPSMemoryBuild SpeedUse Case
485%30k0.7 GB2,500/sMemory-constrained
890%20k1.0 GB2,000/sFast queries
1695%12k1.6 GB1,200/sProduction (balanced)
3298%8k2.8 GB800/sHigh accuracy
6499%5k5.2 GB500/sMaximum recall

Recommendation: Start with M=16, increase to 32 if recall < 95%

ef_construction (Build Quality)

Controls graph quality during index construction.

ef_constructionRecall@10Build TimeUse Case
5088%1.0×Fast development
10092%1.5×Rapid prototyping
20095%2.0×Production (balanced)
40097%3.0×High accuracy
80098%5.0×Maximum quality

Recommendation: Use 200 for production, 400 for mission-critical applications

ef (Search Quality)

Runtime parameter controlling recall vs speed tradeoff.

efRecall@10LatencyQPSUse Case
1075%0.3 ms35kUltra-fast
2085%0.5 ms25kFast search
5095%0.8 ms15kProduction (balanced)
10097%1.2 ms10kHigh recall
20098%2.0 ms6kMaximum recall
50099%5.0 ms3kResearch/evaluation

Recommendation: Start with ef=50, adjust based on recall requirements

SIMD Optimization

CPU Feature Detection

HeliosDB automatically selects the best SIMD implementation:

// Automatic selection (recommended)
let dist = euclidean_distance(&a, &b);
// AVX-512: 11x speedup over scalar
// AVX2: 6x speedup over scalar
// Scalar: fallback for all CPUs

Performance Comparison

512-dimensional vectors, 10,000 distance calculations:

ImplementationTimeSpeedupInstructions
Scalar5.0 ms1.0×~500 per calc
AVX20.8 ms6.2×~80 per calc
AVX-5120.45 ms11.1×~45 per calc

Enabling SIMD

Linux:

Terminal window
# Check CPU features
cat /proc/cpuinfo | grep flags | grep avx512
# Build with native CPU features
RUSTFLAGS="-C target-cpu=native" cargo build --release

Docker:

# Use host CPU features
docker run --cpus=4 --cpu-shares=1024 \
-e RUSTFLAGS="-C target-cpu=native" \
heliosdb/vector-search

Concurrent Query Optimization

Thread Scaling

HNSW provides excellent read concurrency:

ThreadsQPSLatency (p50)Latency (p95)CPU Usage
112,0000.8 ms1.5 ms100%
223,0000.9 ms1.8 ms200%
444,0001.0 ms2.2 ms400%
880,0001.2 ms3.0 ms800%
16140,0001.5 ms4.5 ms1600%

Scaling efficiency: ~90% up to core count

Thread Pool Configuration

use std::sync::Arc;
use rayon::ThreadPoolBuilder;
// Create custom thread pool
let pool = ThreadPoolBuilder::new()
.num_threads(8)
.build()
.unwrap();
let index = Arc::new(index);
pool.install(|| {
// Parallel query processing
queries.par_iter().for_each(|query| {
let results = index.search(query, 10, Some(50), None).unwrap();
// Process results
});
});

Memory Optimization

Memory Usage Formula

Total Memory = Base + (N × Vector Size) + (N × Graph Memory)
Where:
- Base: ~100 MB (runtime overhead)
- Vector Size: D × 4 bytes (f32)
- Graph Memory: M × 8 bytes (per connection)
Example (1M vectors, 128D, M=16):
= 100 MB + (1M × 128 × 4) + (1M × 16 × 8)
= 100 MB + 512 MB + 128 MB
= 740 MB (actual: ~800 MB with overhead)

Memory-Constrained Strategies

1. Reduce M parameter:

let index = HnswIndex::new(8, 200, DistanceMetric::L2); // Half memory of M=16

2. Use lower precision (future):

// f16 instead of f32 (50% reduction)
let vector = VectorData::new_f16(dim, data);

3. Disk-based storage (memory-mapped):

use heliosdb_vector::MmapHnswReader;
let reader = MmapHnswReader::open("index.bin")?;
// Vectors loaded on-demand from disk

4. Distributed sharding:

// Shard across 4 nodes, each handles 25% of data
let coordinator = DistributedCoordinator::new(4, ShardingStrategy::Hash)?;

Filtered Search Optimization

Pre-filtering vs Post-filtering

Pre-filtering (apply filter before vector search):

  • Pros: Fewer distance calculations, faster
  • Cons: May miss results if filter too restrictive
  • Use when: Filter selectivity > 10%
let query = HybridQuery::new(10)
.with_vector(embedding)
.with_filter(filter);
query.pre_filter = true; // Default

Post-filtering (apply filter after vector search):

  • Pros: Better recall, no missed results
  • Cons: More distance calculations
  • Use when: Filter selectivity < 10%
query.pre_filter = false; // Oversample then filter

Performance Impact

Filter SelectivityPre-filter LatencyPost-filter LatencyRecommendation
90% (10% filtered out)0.9 ms1.2 msPre-filter
50% (50% filtered out)1.5 ms2.0 msPre-filter
10% (90% filtered out)8.0 ms3.5 msPost-filter
1% (99% filtered out)50 ms4.0 msPost-filter

Build Optimization

Batch Insertion

Insert vectors in batches for better cache locality:

let batch_size = 1000;
for chunk in vectors.chunks(batch_size) {
for (i, vec) in chunk.iter().enumerate() {
index.insert(Bytes::from(format!("v{}", i)), vec.clone())?;
}
}

Parallel Building (Future)

// Build multiple indices in parallel, then merge
use rayon::prelude::*;
let shards: Vec<HnswIndex> = data_shards
.par_iter()
.map(|shard| {
let mut index = HnswIndex::new(16, 200, DistanceMetric::L2);
for (key, vec) in shard {
index.insert(key.clone(), vec.clone()).unwrap();
}
index
})
.collect();
// Merge shards (implementation needed)
let merged = merge_indices(shards)?;

Build Performance

1M vectors, 128D, M=16, ef_construction=200:

StrategyBuild TimeThroughput
Sequential850s~1,175 vectors/s
Batch (1k)820s~1,220 vectors/s
Parallel (4 shards)250s~4,000 vectors/s

Query Latency Optimization

Latency Breakdown

For 1M vectors, 128D, M=16, ef=50:

PhaseLatency% Total
Distance calculations0.5 ms62%
Graph traversal0.2 ms25%
Result sorting0.1 ms13%
Total0.8 ms100%

Optimization Strategies

1. Reduce distance calculations:

// Lower ef parameter
let results = index.search(&query, k, Some(20), None)?; // 3x faster, -10% recall

2. Use SIMD:

// Ensure AVX2/AVX-512 enabled
RUSTFLAGS="-C target-cpu=native" cargo build --release

3. Warm up cache:

// Pre-warm cache with representative queries
for _ in 0..100 {
let _ = index.search(&warmup_query, 10, Some(50), None)?;
}

4. Reduce k (top-k results):

// Return fewer results
let results = index.search(&query, 5, Some(50), None)?; // Faster than k=100

Recall Optimization

Recall@10 by Configuration

Mef_constructionefRecall@10QPS
81002082%25k
162005095%12k
3240010098%7k
6480020099%4k

Strategies for >99% Recall

1. Increase build quality:

let index = HnswIndex::new(64, 800, DistanceMetric::L2);

2. Increase search width:

let results = index.search(&query, k, Some(500), None)?;

3. Use exact search for small datasets (<10k vectors):

let flat_index = FlatVectorIndex::new(); // 100% recall, slower

4. Verify vector normalization (for Cosine):

normalize(&mut vector);
assert!((vector.iter().map(|x| x * x).sum::<f32>().sqrt() - 1.0).abs() < 1e-6);

Hybrid Search Optimization

Score Fusion Tuning

// Text-heavy queries (e-commerce, documents)
let fusion = FusionStrategy::Weighted {
vector_weight: 0.4,
text_weight: 0.5,
metadata_weight: 0.1,
};
// Vector-heavy queries (image search, embeddings)
let fusion = FusionStrategy::Weighted {
vector_weight: 0.8,
text_weight: 0.1,
metadata_weight: 0.1,
};
// Balanced (recommendation: A/B test)
let fusion = FusionStrategy::Weighted {
vector_weight: 0.6,
text_weight: 0.3,
metadata_weight: 0.1,
};

Reranking Strategy

// Two-stage retrieval: fast approximate → accurate reranking
let query = HybridQuery::new(10)
.with_vector(embedding)
.with_rerank(true); // Fetch 100, rerank, return top 10
// Performance: 1.5x slower, +2-3% recall

Production Monitoring

Key Metrics

use std::time::Instant;
// QPS tracking
let start = Instant::now();
let results = index.search(&query, k, Some(50), None)?;
let latency = start.elapsed();
// Log metrics
metrics.record_latency(latency);
metrics.increment_queries();
// Alert thresholds
if latency > Duration::from_millis(50) {
log::warn!("High latency: {:?}", latency);
}

Index Statistics

let stats = index.statistics()?;
// Monitor these metrics
assert!(stats.avg_degree_layer0 >= 10.0); // Graph connected
assert!(stats.total_edges > stats.num_nodes); // Sufficient edges
assert!(stats.num_layers >= 4); // Multi-layer structure
// Memory alert
if stats.memory_usage_bytes > 10 * 1024 * 1024 * 1024 { // 10 GB
log::warn!("High memory usage: {} GB", stats.memory_usage_bytes / 1_073_741_824);
}

Benchmarking

Run Comprehensive Benchmarks

Terminal window
# All benchmarks
cargo bench --bench vector_benchmarks
# Specific benchmark groups
cargo bench --bench vector_benchmarks -- distance_metrics
cargo bench --bench vector_benchmarks -- hnsw_query
cargo bench --bench vector_benchmarks -- concurrent_queries
# With visualization
cargo install cargo-criterion
cargo criterion

Custom Benchmark

use criterion::{black_box, Criterion};
use std::time::Instant;
fn benchmark_custom(c: &mut Criterion) {
let index = build_test_index();
c.bench_function("custom_workload", |b| {
b.iter(|| {
let results = index.search(
black_box(&query),
black_box(10),
black_box(Some(50)),
black_box(None)
).unwrap();
black_box(results);
});
});
}

Troubleshooting Performance Issues

Issue: Low QPS

Symptoms: <5,000 QPS on modern CPU

Checklist:

  • Verify SIMD enabled (RUSTFLAGS="-C target-cpu=native")
  • Check CPU frequency (not throttled)
  • Reduce ef parameter (200 → 50)
  • Reduce M parameter (32 → 16)
  • Verify no disk I/O (index in memory)
  • Check for lock contention (profiling)

Profiling:

Terminal window
cargo install flamegraph
sudo flamegraph -- cargo bench --bench vector_benchmarks -- hnsw_query
# Open flamegraph.svg

Issue: High Latency

Symptoms: p95 > 100ms

Checklist:

  • Reduce ef (100 → 50 → 20)
  • Check index size (>1M vectors may need sharding)
  • Verify no memory swapping
  • Check GC pauses (if using managed language wrapper)
  • Profile hot paths

Issue: Low Recall

Symptoms: Recall@10 < 90%

Checklist:

  • Increase ef (50 → 100 → 200)
  • Increase ef_construction (200 → 400)
  • Increase M (16 → 32)
  • Verify correct distance metric
  • Check vector normalization (for Cosine)
  • Validate test data quality

Issue: High Memory Usage

Symptoms: >3 GB for 1M vectors (128D)

Checklist:

  • Check M parameter (should be ≤32)
  • Verify no memory leaks
  • Check for duplicate indices
  • Consider memory-mapped storage
  • Profile memory allocations

Advanced Tuning

Layer Distribution

Optimal layer distribution for exponential decay:

100.0%
let stats = index.statistics()?;
// Expected distribution (M=16):
// Layer 1: ~6.25%
// Layer 2: ~0.39%
// Layer 3: ~0.024%
// Layer 4: ~0.0015%
// Verify exponential decay
for i in 1..stats.layer_distribution.len() {
let ratio = stats.layer_distribution[i] as f64 / stats.layer_distribution[i-1] as f64;
assert!(ratio > 0.01 && ratio < 0.2, "Unexpected layer distribution");
}

Query-Specific ef

Adjust ef based on query difficulty:

// Easy query (high confidence)
let results = index.search(&query, k, Some(20), None)?;
// Hard query (ambiguous)
let results = index.search(&query, k, Some(200), None)?;
// Adaptive (based on initial results)
let initial = index.search(&query, k, Some(20), None)?;
let ef = if initial[0].1 > 0.9 { 20 } else { 100 }; // High distance = harder query
let final_results = index.search(&query, k, Some(ef), None)?;

Summary

Production Configuration (1M vectors, 128D):

let index = HnswIndex::new(
16, // M: balanced
200, // ef_construction: good quality
DistanceMetric::Cosine // normalized embeddings
);
let results = index.search(
&query,
10, // k
Some(50), // ef: 95%+ recall
None // filter
)?;
// Expected: 10-15k QPS, 95%+ recall, <5ms p95 latency

Key Takeaways:

  1. Start with defaults (M=16, ef_construction=200, ef=50)
  2. Enable SIMD for 5-10x speedup
  3. Use pre-filtering for high selectivity (>10%)
  4. Monitor recall, QPS, and latency continuously
  5. A/B test parameter changes in production