HeliosDB Adaptive Indexing
HeliosDB Adaptive Indexing
Intelligent adaptive indexing system that automatically recommends and creates indexes based on query patterns using machine learning and cost-benefit analysis.
Features
- Query Pattern Analysis: Tracks and analyzes SQL query patterns to identify indexing opportunities
- ML-Based Recommendations: Uses K-means clustering to find similar query patterns and recommend optimal indexes
- Cost-Benefit Analysis: Evaluates the trade-off between index maintenance cost and query speedup
- Automatic Index Management: Can automatically create and delete indexes based on usage patterns
- Index Usage Statistics: Monitors which indexes are used and how effective they are
- Approval Workflow: Optional manual approval for index creation/deletion
- Multi-Column Indexes: Recommends composite indexes for queries using multiple columns
- Partial Indexes: Suggests filtered indexes for queries with common predicates
Installation
Add to your Cargo.toml:
[dependencies]heliosdb-adaptive = "3.0"Quick Start
use heliosdb_adaptive::{AdaptiveIndexer, AdaptiveConfig};use std::time::Duration;
#[tokio::main]async fn main() -> Result<(), Box<dyn std::error::Error>> { // Configure the adaptive indexer let config = AdaptiveConfig { auto_create: false, // Manual approval auto_delete: false, min_query_count: 10, // Minimum queries before recommendation cost_threshold: 2.0, // Minimum benefit/cost ratio analysis_interval: Duration::from_secs(300), };
// Create indexer let indexer = AdaptiveIndexer::new(config).await?;
// Record query executions indexer.record_query( "SELECT * FROM users WHERE age > 25", 50.0 // latency in ms ).await?;
// Get index recommendations let recommendations = indexer.recommend_indexes().await?;
for rec in recommendations { println!("Recommend index on {}.{:?}", rec.table, rec.columns); println!(" Priority: {:?}", rec.priority); println!(" Score: {:.2}", rec.score); println!(" Reason: {}", rec.reason); }
Ok(())}Architecture
The adaptive indexer consists of several components:
1. Pattern Analyzer
Parses SQL queries and extracts patterns:
- Table names
- Columns in SELECT, WHERE, ORDER BY, GROUP BY, JOIN
- Operation types (Filter, Sort, Join, Aggregate)
- Query frequency and latency
2. ML Recommender
Uses machine learning to cluster similar queries:
- K-means clustering of query feature vectors
- Identifies common column access patterns
- Recommends indexes based on cluster analysis
3. Cost-Benefit Analyzer
Evaluates index recommendations:
- Estimates index size and maintenance cost
- Calculates query speedup factor
- Computes benefit/cost ratio
- Supports single-column, multi-column, and partial indexes
4. Index Advisor
Combines all components to generate recommendations:
- Analyzes query patterns
- Applies ML clustering
- Performs cost-benefit analysis
- Ranks recommendations by priority and score
5. Auto Index Manager
Manages index lifecycle:
- Automatic index creation (optional)
- Automatic deletion of unused indexes
- Approval workflow for manual control
- Index usage tracking
6. Statistics Tracker
Monitors index effectiveness:
- Usage count per index
- Average selectivity and speedup
- Storage utilization
- Identifies unused indexes
Usage Examples
Basic Pattern Analysis
let indexer = AdaptiveIndexer::new(config).await?;
// Record various queriesfor _ in 0..20 { indexer.record_query( "SELECT * FROM users WHERE age > 25 AND status = 'active'", 75.0 ).await?;}
// Analyze patternslet patterns = indexer.analyze_patterns().await?;
for pattern in patterns { println!("Table: {}, Frequency: {}", pattern.table, pattern.frequency); println!("Operations: {:?}", pattern.operations);}With Automatic Index Creation
use heliosdb_adaptive::auto_index::IndexExecutor;
// Implement IndexExecutor for your databasestruct MyDatabase { /* ... */ }
#[async_trait]impl IndexExecutor for MyDatabase { async fn create_index(&self, table: &str, columns: &[String], index_type: &str, index_name: &str) -> Result<(), String> { // Your database-specific index creation logic Ok(()) }
// Implement other required methods...}
// Create indexer with executorlet db = Arc::new(MyDatabase::new());let indexer = AdaptiveIndexer::with_executor(config, db).await?;
// Process recommendations automaticallylet recommendations = indexer.recommend_indexes().await?;for rec in recommendations { indexer.create_index(rec).await?;}Manual Approval Workflow
// Get pending actionslet pending = indexer.get_pending_actions().await?;
for (i, action) in pending.iter().enumerate() { match action { IndexAction::Create(req) => { println!("Create index: {:?}", req.recommendation.columns); println!("Score: {:.2}", req.recommendation.score); } IndexAction::Delete(req) => { println!("Delete index: {}", req.index_name); println!("Reason: {}", req.reason); } }}
// Approve specific actionindexer.approve_action(0).await?;
// Or rejectindexer.reject_action(1).await?;Cost-Benefit Analysis
use heliosdb_adaptive::cost_benefit::TableStatistics;
// Set table statistics for accurate analysislet mut stats = TableStatistics::default();stats.row_count = 100_000;stats.reads_per_day = 10_000;stats.writes_per_day = 1_000;stats.column_cardinality.insert("age".to_string(), 100);
// Recommendations will use these statisticslet recommendations = indexer.recommend_indexes().await?;
for rec in recommendations { println!("Estimated benefit: ${:.2}/day", rec.estimated_benefit); println!("Estimated cost: ${:.2}/day", rec.estimated_cost); println!("ROI: {:.2}x", rec.score);}Index Statistics
// Get overall statisticslet stats = indexer.get_statistics().await;
println!("Total indexes: {}", stats.total_indexes);println!("Active indexes: {}", stats.active_indexes);println!("Unused indexes: {}", stats.unused_indexes);println!("Avg speedup: {:.2}x", stats.avg_query_speedup);
// Get details for each indexfor idx in stats.indexes { println!("{}: used {} times, speedup {:.2}x", idx.name, idx.usage_count, idx.avg_speedup);}
// Export as JSONlet json = indexer.export_statistics_json().await?;Cleanup Unused Indexes
// Automatically delete indexes not used in 30+ dayslet deleted = indexer.drop_unused_indexes().await?;
for index_name in deleted { println!("Deleted unused index: {}", index_name);}Configuration Options
AdaptiveConfig
auto_create: Enable automatic index creation (default: false)auto_delete: Enable automatic deletion of unused indexes (default: false)min_query_count: Minimum query frequency before recommending index (default: 10)cost_threshold: Minimum benefit/cost ratio for recommendations (default: 2.0)analysis_interval: How often to analyze patterns (default: 300s)
Index Types
BTree: Standard B-tree index for range queriesHash: Hash index for equality lookupsPartial(condition): Filtered index for subset of rowsMultiColumn(columns): Composite index for multiple columns
Priority Levels
Critical: Score >= 10.0, immediate action recommendedHigh: Score >= 5.0, high priorityMedium: Score >= 3.0, moderate priorityLow: Score >= threshold, low priority
Performance
The adaptive indexer is designed for production use:
- Low Overhead: Pattern analysis is ~1-2% query overhead
- Async: All operations are non-blocking
- Concurrent: Thread-safe using lock-free data structures (DashMap)
- Scalable: Handles 10,000+ queries/second
- Memory Efficient: Configurable history size limits
Testing
Run the test suite:
# Unit testscargo test --lib
# Integration testscargo test --test integration_test
# All testscargo test
# With outputcargo test -- --nocaptureRun the example:
cargo run --example adaptive_indexBenchmarks
Run performance benchmarks:
cargo benchHow It Works
- Query Recording: Queries are parsed and patterns are extracted
- Pattern Clustering: ML algorithms group similar queries
- Cost Analysis: Each potential index is evaluated for ROI
- Recommendation: Indexes are ranked by priority and score
- Creation: Indexes are created (automatically or with approval)
- Monitoring: Usage statistics are tracked continuously
- Cleanup: Unused indexes are identified and removed
Best Practices
- Start Conservative: Begin with manual approval (
auto_create: false) - Set Realistic Thresholds: Adjust
min_query_countandcost_thresholdbased on workload - Provide Table Stats: Set accurate statistics for better recommendations
- Monitor Statistics: Regularly check index usage and effectiveness
- Review Recommendations: Don’t blindly accept all recommendations
- Test Impact: Measure query performance before/after index creation
- Cleanup Regularly: Remove unused indexes to save space
Integration with HeliosDB
This module integrates seamlessly with HeliosDB’s query execution:
// In your query executorimpl QueryExecutor { async fn execute(&self, query: &str) -> Result<QueryResult> { let start = Instant::now(); let result = self.execute_internal(query).await?; let latency = start.elapsed().as_secs_f64() * 1000.0;
// Record for adaptive indexing self.adaptive_indexer.record_query(query, latency).await?;
Ok(result) }}License
MIT License - see LICENSE file for details
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Support
For issues and questions:
- GitHub Issues: https://github.com/heliosdb/heliosdb/issues
- Documentation: https://docs.heliosdb.io
- Discord: https://discord.gg/heliosdb