HeliosDB Adaptive Indexing

Intelligent adaptive indexing system that automatically recommends and creates indexes based on query patterns using machine learning and cost-benefit analysis.

Features

Query Pattern Analysis: Tracks and analyzes SQL query patterns to identify indexing opportunities
ML-Based Recommendations: Uses K-means clustering to find similar query patterns and recommend optimal indexes
Cost-Benefit Analysis: Evaluates the trade-off between index maintenance cost and query speedup
Automatic Index Management: Can automatically create and delete indexes based on usage patterns
Index Usage Statistics: Monitors which indexes are used and how effective they are
Approval Workflow: Optional manual approval for index creation/deletion
Multi-Column Indexes: Recommends composite indexes for queries using multiple columns
Partial Indexes: Suggests filtered indexes for queries with common predicates

Installation

Add to your Cargo.toml:

[dependencies]
heliosdb-adaptive = "3.0"

Quick Start

use heliosdb_adaptive::{AdaptiveIndexer, AdaptiveConfig};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure the adaptive indexer
    let config = AdaptiveConfig {
        auto_create: false,  // Manual approval
        auto_delete: false,
        min_query_count: 10, // Minimum queries before recommendation
        cost_threshold: 2.0, // Minimum benefit/cost ratio
        analysis_interval: Duration::from_secs(300),
    };

    // Create indexer
    let indexer = AdaptiveIndexer::new(config).await?;

    // Record query executions
    indexer.record_query(
        "SELECT * FROM users WHERE age > 25",
        50.0  // latency in ms
    ).await?;

    // Get index recommendations
    let recommendations = indexer.recommend_indexes().await?;

    for rec in recommendations {
        println!("Recommend index on {}.{:?}", rec.table, rec.columns);
        println!("  Priority: {:?}", rec.priority);
        println!("  Score: {:.2}", rec.score);
        println!("  Reason: {}", rec.reason);
    }

    Ok(())
}

Architecture

The adaptive indexer consists of several components:

1. Pattern Analyzer

Parses SQL queries and extracts patterns:

Table names
Columns in SELECT, WHERE, ORDER BY, GROUP BY, JOIN
Operation types (Filter, Sort, Join, Aggregate)
Query frequency and latency

2. ML Recommender

Uses machine learning to cluster similar queries:

K-means clustering of query feature vectors
Identifies common column access patterns
Recommends indexes based on cluster analysis

3. Cost-Benefit Analyzer

Evaluates index recommendations:

Estimates index size and maintenance cost
Calculates query speedup factor
Computes benefit/cost ratio
Supports single-column, multi-column, and partial indexes

4. Index Advisor

Combines all components to generate recommendations:

Analyzes query patterns
Applies ML clustering
Performs cost-benefit analysis
Ranks recommendations by priority and score

5. Auto Index Manager

Manages index lifecycle:

Automatic index creation (optional)
Automatic deletion of unused indexes
Approval workflow for manual control
Index usage tracking

6. Statistics Tracker

Monitors index effectiveness:

Usage count per index
Average selectivity and speedup
Storage utilization
Identifies unused indexes

Usage Examples

Basic Pattern Analysis

let indexer = AdaptiveIndexer::new(config).await?;

// Record various queries
for _ in 0..20 {
    indexer.record_query(
        "SELECT * FROM users WHERE age > 25 AND status = 'active'",
        75.0
    ).await?;
}

// Analyze patterns
let patterns = indexer.analyze_patterns().await?;

for pattern in patterns {
    println!("Table: {}, Frequency: {}", pattern.table, pattern.frequency);
    println!("Operations: {:?}", pattern.operations);
}

With Automatic Index Creation

use heliosdb_adaptive::auto_index::IndexExecutor;

// Implement IndexExecutor for your database
struct MyDatabase { /* ... */ }

#[async_trait]
impl IndexExecutor for MyDatabase {
    async fn create_index(&self, table: &str, columns: &[String],
                         index_type: &str, index_name: &str) -> Result<(), String> {
        // Your database-specific index creation logic
        Ok(())
    }

    // Implement other required methods...
}

// Create indexer with executor
let db = Arc::new(MyDatabase::new());
let indexer = AdaptiveIndexer::with_executor(config, db).await?;

// Process recommendations automatically
let recommendations = indexer.recommend_indexes().await?;
for rec in recommendations {
    indexer.create_index(rec).await?;
}

Manual Approval Workflow

// Get pending actions
let pending = indexer.get_pending_actions().await?;

for (i, action) in pending.iter().enumerate() {
    match action {
        IndexAction::Create(req) => {
            println!("Create index: {:?}", req.recommendation.columns);
            println!("Score: {:.2}", req.recommendation.score);
        }
        IndexAction::Delete(req) => {
            println!("Delete index: {}", req.index_name);
            println!("Reason: {}", req.reason);
        }
    }
}

// Approve specific action
indexer.approve_action(0).await?;

// Or reject
indexer.reject_action(1).await?;

Cost-Benefit Analysis

use heliosdb_adaptive::cost_benefit::TableStatistics;

// Set table statistics for accurate analysis
let mut stats = TableStatistics::default();
stats.row_count = 100_000;
stats.reads_per_day = 10_000;
stats.writes_per_day = 1_000;
stats.column_cardinality.insert("age".to_string(), 100);

// Recommendations will use these statistics
let recommendations = indexer.recommend_indexes().await?;

for rec in recommendations {
    println!("Estimated benefit: ${:.2}/day", rec.estimated_benefit);
    println!("Estimated cost: ${:.2}/day", rec.estimated_cost);
    println!("ROI: {:.2}x", rec.score);
}

Index Statistics

// Get overall statistics
let stats = indexer.get_statistics().await;

println!("Total indexes: {}", stats.total_indexes);
println!("Active indexes: {}", stats.active_indexes);
println!("Unused indexes: {}", stats.unused_indexes);
println!("Avg speedup: {:.2}x", stats.avg_query_speedup);

// Get details for each index
for idx in stats.indexes {
    println!("{}: used {} times, speedup {:.2}x",
             idx.name, idx.usage_count, idx.avg_speedup);
}

// Export as JSON
let json = indexer.export_statistics_json().await?;

Cleanup Unused Indexes

// Automatically delete indexes not used in 30+ days
let deleted = indexer.drop_unused_indexes().await?;

for index_name in deleted {
    println!("Deleted unused index: {}", index_name);
}

Configuration Options

AdaptiveConfig

auto_create: Enable automatic index creation (default: false)
auto_delete: Enable automatic deletion of unused indexes (default: false)
min_query_count: Minimum query frequency before recommending index (default: 10)
cost_threshold: Minimum benefit/cost ratio for recommendations (default: 2.0)
analysis_interval: How often to analyze patterns (default: 300s)

Index Types

BTree: Standard B-tree index for range queries
Hash: Hash index for equality lookups
Partial(condition): Filtered index for subset of rows
MultiColumn(columns): Composite index for multiple columns

Priority Levels

Critical: Score >= 10.0, immediate action recommended
High: Score >= 5.0, high priority
Medium: Score >= 3.0, moderate priority
Low: Score >= threshold, low priority

Performance

The adaptive indexer is designed for production use:

Low Overhead: Pattern analysis is ~1-2% query overhead
Async: All operations are non-blocking
Concurrent: Thread-safe using lock-free data structures (DashMap)
Scalable: Handles 10,000+ queries/second
Memory Efficient: Configurable history size limits

Testing

Run the test suite:

# Unit tests
cargo test --lib

# Integration tests
cargo test --test integration_test

# All tests
cargo test

# With output
cargo test -- --nocapture

Run the example:

cargo run --example adaptive_index

Benchmarks

Run performance benchmarks:

cargo bench

How It Works

Query Recording: Queries are parsed and patterns are extracted
Pattern Clustering: ML algorithms group similar queries
Cost Analysis: Each potential index is evaluated for ROI
Recommendation: Indexes are ranked by priority and score
Creation: Indexes are created (automatically or with approval)
Monitoring: Usage statistics are tracked continuously
Cleanup: Unused indexes are identified and removed

Best Practices

Start Conservative: Begin with manual approval (auto_create: false)
Set Realistic Thresholds: Adjust min_query_count and cost_threshold based on workload
Provide Table Stats: Set accurate statistics for better recommendations
Monitor Statistics: Regularly check index usage and effectiveness
Review Recommendations: Don’t blindly accept all recommendations
Test Impact: Measure query performance before/after index creation
Cleanup Regularly: Remove unused indexes to save space

Integration with HeliosDB

This module integrates seamlessly with HeliosDB’s query execution:

// In your query executor
impl QueryExecutor {
    async fn execute(&self, query: &str) -> Result<QueryResult> {
        let start = Instant::now();
        let result = self.execute_internal(query).await?;
        let latency = start.elapsed().as_secs_f64() * 1000.0;

        // Record for adaptive indexing
        self.adaptive_indexer.record_query(query, latency).await?;

        Ok(result)
    }
}

License

MIT License - see LICENSE file for details

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Support

For issues and questions:

GitHub Issues: https://github.com/heliosdb/heliosdb/issues
Documentation: https://docs.heliosdb.io
Discord: https://discord.gg/heliosdb