Skip to content

HeliosDB Adaptive Indexing

HeliosDB Adaptive Indexing

Intelligent adaptive indexing system that automatically recommends and creates indexes based on query patterns using machine learning and cost-benefit analysis.

Features

  • Query Pattern Analysis: Tracks and analyzes SQL query patterns to identify indexing opportunities
  • ML-Based Recommendations: Uses K-means clustering to find similar query patterns and recommend optimal indexes
  • Cost-Benefit Analysis: Evaluates the trade-off between index maintenance cost and query speedup
  • Automatic Index Management: Can automatically create and delete indexes based on usage patterns
  • Index Usage Statistics: Monitors which indexes are used and how effective they are
  • Approval Workflow: Optional manual approval for index creation/deletion
  • Multi-Column Indexes: Recommends composite indexes for queries using multiple columns
  • Partial Indexes: Suggests filtered indexes for queries with common predicates

Installation

Add to your Cargo.toml:

[dependencies]
heliosdb-adaptive = "3.0"

Quick Start

use heliosdb_adaptive::{AdaptiveIndexer, AdaptiveConfig};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure the adaptive indexer
let config = AdaptiveConfig {
auto_create: false, // Manual approval
auto_delete: false,
min_query_count: 10, // Minimum queries before recommendation
cost_threshold: 2.0, // Minimum benefit/cost ratio
analysis_interval: Duration::from_secs(300),
};
// Create indexer
let indexer = AdaptiveIndexer::new(config).await?;
// Record query executions
indexer.record_query(
"SELECT * FROM users WHERE age > 25",
50.0 // latency in ms
).await?;
// Get index recommendations
let recommendations = indexer.recommend_indexes().await?;
for rec in recommendations {
println!("Recommend index on {}.{:?}", rec.table, rec.columns);
println!(" Priority: {:?}", rec.priority);
println!(" Score: {:.2}", rec.score);
println!(" Reason: {}", rec.reason);
}
Ok(())
}

Architecture

The adaptive indexer consists of several components:

1. Pattern Analyzer

Parses SQL queries and extracts patterns:

  • Table names
  • Columns in SELECT, WHERE, ORDER BY, GROUP BY, JOIN
  • Operation types (Filter, Sort, Join, Aggregate)
  • Query frequency and latency

2. ML Recommender

Uses machine learning to cluster similar queries:

  • K-means clustering of query feature vectors
  • Identifies common column access patterns
  • Recommends indexes based on cluster analysis

3. Cost-Benefit Analyzer

Evaluates index recommendations:

  • Estimates index size and maintenance cost
  • Calculates query speedup factor
  • Computes benefit/cost ratio
  • Supports single-column, multi-column, and partial indexes

4. Index Advisor

Combines all components to generate recommendations:

  • Analyzes query patterns
  • Applies ML clustering
  • Performs cost-benefit analysis
  • Ranks recommendations by priority and score

5. Auto Index Manager

Manages index lifecycle:

  • Automatic index creation (optional)
  • Automatic deletion of unused indexes
  • Approval workflow for manual control
  • Index usage tracking

6. Statistics Tracker

Monitors index effectiveness:

  • Usage count per index
  • Average selectivity and speedup
  • Storage utilization
  • Identifies unused indexes

Usage Examples

Basic Pattern Analysis

let indexer = AdaptiveIndexer::new(config).await?;
// Record various queries
for _ in 0..20 {
indexer.record_query(
"SELECT * FROM users WHERE age > 25 AND status = 'active'",
75.0
).await?;
}
// Analyze patterns
let patterns = indexer.analyze_patterns().await?;
for pattern in patterns {
println!("Table: {}, Frequency: {}", pattern.table, pattern.frequency);
println!("Operations: {:?}", pattern.operations);
}

With Automatic Index Creation

use heliosdb_adaptive::auto_index::IndexExecutor;
// Implement IndexExecutor for your database
struct MyDatabase { /* ... */ }
#[async_trait]
impl IndexExecutor for MyDatabase {
async fn create_index(&self, table: &str, columns: &[String],
index_type: &str, index_name: &str) -> Result<(), String> {
// Your database-specific index creation logic
Ok(())
}
// Implement other required methods...
}
// Create indexer with executor
let db = Arc::new(MyDatabase::new());
let indexer = AdaptiveIndexer::with_executor(config, db).await?;
// Process recommendations automatically
let recommendations = indexer.recommend_indexes().await?;
for rec in recommendations {
indexer.create_index(rec).await?;
}

Manual Approval Workflow

// Get pending actions
let pending = indexer.get_pending_actions().await?;
for (i, action) in pending.iter().enumerate() {
match action {
IndexAction::Create(req) => {
println!("Create index: {:?}", req.recommendation.columns);
println!("Score: {:.2}", req.recommendation.score);
}
IndexAction::Delete(req) => {
println!("Delete index: {}", req.index_name);
println!("Reason: {}", req.reason);
}
}
}
// Approve specific action
indexer.approve_action(0).await?;
// Or reject
indexer.reject_action(1).await?;

Cost-Benefit Analysis

use heliosdb_adaptive::cost_benefit::TableStatistics;
// Set table statistics for accurate analysis
let mut stats = TableStatistics::default();
stats.row_count = 100_000;
stats.reads_per_day = 10_000;
stats.writes_per_day = 1_000;
stats.column_cardinality.insert("age".to_string(), 100);
// Recommendations will use these statistics
let recommendations = indexer.recommend_indexes().await?;
for rec in recommendations {
println!("Estimated benefit: ${:.2}/day", rec.estimated_benefit);
println!("Estimated cost: ${:.2}/day", rec.estimated_cost);
println!("ROI: {:.2}x", rec.score);
}

Index Statistics

// Get overall statistics
let stats = indexer.get_statistics().await;
println!("Total indexes: {}", stats.total_indexes);
println!("Active indexes: {}", stats.active_indexes);
println!("Unused indexes: {}", stats.unused_indexes);
println!("Avg speedup: {:.2}x", stats.avg_query_speedup);
// Get details for each index
for idx in stats.indexes {
println!("{}: used {} times, speedup {:.2}x",
idx.name, idx.usage_count, idx.avg_speedup);
}
// Export as JSON
let json = indexer.export_statistics_json().await?;

Cleanup Unused Indexes

// Automatically delete indexes not used in 30+ days
let deleted = indexer.drop_unused_indexes().await?;
for index_name in deleted {
println!("Deleted unused index: {}", index_name);
}

Configuration Options

AdaptiveConfig

  • auto_create: Enable automatic index creation (default: false)
  • auto_delete: Enable automatic deletion of unused indexes (default: false)
  • min_query_count: Minimum query frequency before recommending index (default: 10)
  • cost_threshold: Minimum benefit/cost ratio for recommendations (default: 2.0)
  • analysis_interval: How often to analyze patterns (default: 300s)

Index Types

  • BTree: Standard B-tree index for range queries
  • Hash: Hash index for equality lookups
  • Partial(condition): Filtered index for subset of rows
  • MultiColumn(columns): Composite index for multiple columns

Priority Levels

  • Critical: Score >= 10.0, immediate action recommended
  • High: Score >= 5.0, high priority
  • Medium: Score >= 3.0, moderate priority
  • Low: Score >= threshold, low priority

Performance

The adaptive indexer is designed for production use:

  • Low Overhead: Pattern analysis is ~1-2% query overhead
  • Async: All operations are non-blocking
  • Concurrent: Thread-safe using lock-free data structures (DashMap)
  • Scalable: Handles 10,000+ queries/second
  • Memory Efficient: Configurable history size limits

Testing

Run the test suite:

Terminal window
# Unit tests
cargo test --lib
# Integration tests
cargo test --test integration_test
# All tests
cargo test
# With output
cargo test -- --nocapture

Run the example:

Terminal window
cargo run --example adaptive_index

Benchmarks

Run performance benchmarks:

Terminal window
cargo bench

How It Works

  1. Query Recording: Queries are parsed and patterns are extracted
  2. Pattern Clustering: ML algorithms group similar queries
  3. Cost Analysis: Each potential index is evaluated for ROI
  4. Recommendation: Indexes are ranked by priority and score
  5. Creation: Indexes are created (automatically or with approval)
  6. Monitoring: Usage statistics are tracked continuously
  7. Cleanup: Unused indexes are identified and removed

Best Practices

  1. Start Conservative: Begin with manual approval (auto_create: false)
  2. Set Realistic Thresholds: Adjust min_query_count and cost_threshold based on workload
  3. Provide Table Stats: Set accurate statistics for better recommendations
  4. Monitor Statistics: Regularly check index usage and effectiveness
  5. Review Recommendations: Don’t blindly accept all recommendations
  6. Test Impact: Measure query performance before/after index creation
  7. Cleanup Regularly: Remove unused indexes to save space

Integration with HeliosDB

This module integrates seamlessly with HeliosDB’s query execution:

// In your query executor
impl QueryExecutor {
async fn execute(&self, query: &str) -> Result<QueryResult> {
let start = Instant::now();
let result = self.execute_internal(query).await?;
let latency = start.elapsed().as_secs_f64() * 1000.0;
// Record for adaptive indexing
self.adaptive_indexer.record_query(query, latency).await?;
Ok(result)
}
}

License

MIT License - see LICENSE file for details

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Support

For issues and questions: