HeliosDB Tier 2/3 AI/ML Features User Guide
HeliosDB Tier 2/3 AI/ML Features User Guide
Version: v7.1.2 Last Updated: January 25, 2026 Status: Production Beta
Overview
HeliosDB includes 9 advanced AI/ML packages (Tier 2/3 features) that provide cutting-edge database intelligence capabilities. These features collectively represent $83M-$151M ARR potential and include 4 world-first innovations.
Package Summary
| Package | Category | Innovation | Completeness | ARR Potential |
|---|---|---|---|---|
| Neural Query Planner | Query Optimization | World-First | 80% | $9M-$20M |
| Schema AI | Data Modeling | World-First | 75% | $15M-$25M |
| RL-Based Cache | Caching | World-First | 70% | $10M-$18M |
| MAB Load Balancer | Load Balancing | World-First | 75% | $8M-$15M |
| Anomaly Detection | Monitoring | Advanced | 75% | $8M-$18M |
| Time-Series Forecasting | Analytics | Advanced | 70% | $7M-$15M |
| AutoML Tuning | Configuration | Advanced | 65% | $12M-$18M |
| Auto-Index | Indexing | Advanced | 70% | $8M-$12M |
| Probabilistic Structures | Data Structures | Advanced | 70% | $6M-$10M |
1. Neural Query Planner
Location: heliosdb-ai/crates/neural-planner
Description
World’s first production deep learning-based query optimizer using Transformer encoder + Graph Neural Network architecture for plan generation.
Key Features
- Deep Learning Query Optimization: Transformer + GNN architecture
- Real-time Inference: Sub-5ms latency plan generation
- Learned Cost Model: Neural network-based cost estimation
- Beam Search Exploration: Guided plan search with neural heuristics
- ONNX Export: Production deployment via tract-onnx
Usage
use heliosdb_neural_planner::{ planner::{NeuralPlanner, PlannerConfig}, featurizer::QueryFeaturizer,};
// Initialize planner with custom configlet config = PlannerConfig::default() .with_inference_batch_size(64) .with_beam_width(5) .with_model_type("transformer-gnn");
let planner = NeuralPlanner::new(config);
// Train on your workloadlet training_queries = vec![ ("SELECT * FROM users WHERE id = ?", vec!["IndexScan(users.id_idx)"]),];for (query, plan) in &training_queries { planner.add_training_example(query, plan).await?;}planner.train(100, 0.001).await?;
// Generate optimized planslet features = QueryFeaturizer::new().featurize(&parsed_query).await?;let plan = planner.generate_plan(&features).await?;
// Export for productionplanner.export_onnx("models/query_planner.onnx").await?;Configuration Options
| Option | Default | Description |
|---|---|---|
inference_batch_size | 64 | Batch size for inference |
beam_width | 5 | Beam search width |
model_type | ”transformer-gnn” | Model architecture |
learning_rate | 0.001 | Training learning rate |
max_plan_depth | 20 | Maximum plan tree depth |
2. Schema AI (Generative Schema Designer)
Location: heliosdb-data/crates/schema-ai
Description
AI-powered schema design from natural language descriptions. Converts NL requirements to optimized database schemas with automatic normalization.
Key Features
- NL-to-ERD: Natural language to Entity-Relationship Diagrams
- Automatic Normalization: 1NF to BCNF normalization
- Index Recommendations: AI-suggested indexes
- Schema Evolution: Intelligent migration generation
- Multi-dialect Support: PostgreSQL, MySQL, SQLite DDL generation
Usage
use heliosdb_schema_ai::{ SchemaGenerator, SchemaConfig, NormalizationLevel,};
let generator = SchemaGenerator::new(SchemaConfig { target_normalization: NormalizationLevel::ThirdNormalForm, generate_indexes: true, target_dialect: "postgresql",});
// Generate schema from descriptionlet description = "E-commerce system with users who can place orders containing multiple products. Track inventory levels and support product categories.";
let schema = generator.generate_from_description(description).await?;
// Output DDLprintln!("{}", schema.to_ddl());
// Get ER diagram (Mermaid format)println!("{}", schema.to_mermaid());
// Generate migration from existing schemalet migration = generator.generate_migration( ¤t_schema, &target_schema).await?;Schema Generation Options
| Option | Values | Description |
|---|---|---|
target_normalization | 1NF, 2NF, 3NF, BCNF | Normalization level |
generate_indexes | true/false | Auto-generate indexes |
target_dialect | postgresql, mysql, sqlite | SQL dialect |
include_constraints | true/false | Generate constraints |
3. RL-Based Intelligent Cache
Location: heliosdb-cache/crates/rl-cache
Description
Reinforcement learning-based cache eviction using Deep Q-Network (DQN). Learns optimal caching policies from access patterns.
Key Features
- DQN-based Eviction: Deep Q-Network for policy learning
- Contextual Features: Uses query patterns, time-of-day, workload type
- Adaptive Learning: Continuous improvement from production traffic
- Multi-tier Support: Coordinates L1/L2/L3 cache hierarchies
- Workload-aware: Different policies for OLTP vs OLAP
Usage
use heliosdb_rl_cache::{ RLCache, RLCacheConfig, FeatureExtractor,};
let config = RLCacheConfig { capacity_mb: 1024, learning_rate: 0.001, discount_factor: 0.99, exploration_rate: 0.1, replay_buffer_size: 10000,};
let cache = RLCache::new(config);
// The cache learns automatically from access patternscache.get(&key).await; // Records accesscache.put(key, value).await; // Learns eviction policy
// Force policy update (normally automatic)cache.update_policy().await?;
// Get cache statisticslet stats = cache.get_stats();println!("Hit rate: {:.2}%", stats.hit_rate * 100.0);println!("Eviction quality: {:.2}", stats.eviction_quality_score);Configuration
| Option | Default | Description |
|---|---|---|
capacity_mb | 1024 | Cache size in MB |
learning_rate | 0.001 | DQN learning rate |
discount_factor | 0.99 | Future reward discount |
exploration_rate | 0.1 | Epsilon for exploration |
update_frequency | 100 | Steps between updates |
4. Multi-Armed Bandit Load Balancer
Location: heliosdb-cluster/crates/mab-balancer
Description
LinUCB contextual bandit algorithm for intelligent request routing. Balances load while optimizing for latency and throughput.
Key Features
- Contextual Bandits: LinUCB algorithm with context features
- Latency Optimization: Routes to fastest available node
- Adaptive Exploration: Balances exploration vs exploitation
- Health-aware: Considers node health in routing decisions
- Multi-objective: Optimizes latency, throughput, and fairness
Usage
use heliosdb_mab_balancer::{ MABBalancer, BalancerConfig, RequestContext,};
let config = BalancerConfig { alpha: 0.5, // Exploration parameter context_features: vec!["query_type", "data_size", "time_of_day"], update_batch_size: 100,};
let balancer = MABBalancer::new(config, nodes);
// Route request with contextlet context = RequestContext { query_type: QueryType::Read, estimated_data_size: 1024, priority: Priority::Normal,};
let selected_node = balancer.select_node(&context).await?;
// Record outcome for learningbalancer.record_outcome( selected_node, &context, latency_ms, success,).await?;
// Get routing statisticslet stats = balancer.get_stats();for (node, node_stats) in stats.per_node { println!("{}: avg_latency={:.2}ms, selection_rate={:.2}%", node, node_stats.avg_latency, node_stats.selection_rate * 100.0);}5. Anomaly Detection
Location: heliosdb-ai/crates/anomaly-detection
Description
Multi-algorithm anomaly detection for database metrics, query patterns, and data quality monitoring.
Supported Algorithms
| Algorithm | Use Case | Strengths |
|---|---|---|
| Isolation Forest | General | Fast, handles high dimensions |
| Local Outlier Factor | Density-based | Good for clusters |
| DBSCAN | Clustering | Finds arbitrary shapes |
| One-Class SVM | Novelty detection | Works with limited data |
| LSTM Autoencoder | Time-series | Captures temporal patterns |
| Statistical (Z-score) | Simple metrics | Interpretable, fast |
Usage
use heliosdb_anomaly_detection::{ AnomalyDetector, DetectorConfig, Algorithm, MetricStream, AnomalyAlert,};
let config = DetectorConfig { algorithm: Algorithm::IsolationForest, contamination: 0.01, // Expected anomaly rate sensitivity: 0.8, window_size: 100,};
let detector = AnomalyDetector::new(config);
// Train on historical datadetector.fit(&historical_metrics).await?;
// Real-time detectionlet mut stream = MetricStream::new();while let Some(metric) = stream.next().await { if let Some(alert) = detector.detect(&metric).await? { println!("ANOMALY: {} (score: {:.2}, confidence: {:.2}%)", alert.description, alert.anomaly_score, alert.confidence * 100.0); }}
// Batch detectionlet anomalies = detector.detect_batch(&metrics).await?;for anomaly in anomalies { println!("Anomaly at index {}: {}", anomaly.index, anomaly.description);}Algorithm Selection Guide
Query latency monitoring → Isolation Forest or LSTMConnection patterns → DBSCANData quality checks → Statistical (Z-score)Novel query detection → One-Class SVMGeneral monitoring → Ensemble (multiple algorithms)6. Time-Series Forecasting
Location: heliosdb-ai/crates/forecasting
Description
Comprehensive time-series forecasting for capacity planning, workload prediction, and trend analysis.
Supported Algorithms
| Algorithm | Best For | Accuracy |
|---|---|---|
| ARIMA | Stationary data | High |
| Prophet | Seasonal + holidays | High |
| LSTM | Complex patterns | Very High |
| Exponential Smoothing | Simple trends | Medium |
| Ensemble | General | Highest |
Usage
use heliosdb_forecasting::{ Forecaster, ForecastConfig, Algorithm, TimeSeries, Forecast,};
let config = ForecastConfig { algorithm: Algorithm::AutoSelect, // Automatically choose best forecast_horizon: 24, // Hours ahead confidence_level: 0.95, seasonality: Some(Seasonality::Daily),};
let forecaster = Forecaster::new(config);
// Fit on historical datalet history = TimeSeries::from_vec(historical_values, timestamps);forecaster.fit(&history).await?;
// Generate forecastlet forecast = forecaster.predict(24).await?;
println!("Forecast for next 24 hours:");for point in forecast.points { println!("{}: {:.2} (CI: {:.2} - {:.2})", point.timestamp, point.value, point.lower_bound, point.upper_bound);}
// Capacity planninglet capacity_forecast = forecaster.forecast_capacity( current_usage, growth_rate, 30, // days).await?;
println!("Estimated time to 80% capacity: {} days", capacity_forecast.days_to_threshold(0.8));7. AutoML Tuning
Location: heliosdb-ai/crates/automl-tuning
Description
Automatic database configuration tuning using Bayesian Optimization and Genetic Algorithms.
Key Features
- Bayesian Optimization: Efficient hyperparameter search
- Genetic Algorithms: Evolves optimal configurations
- Safe Exploration: Constraints to prevent bad configs
- A/B Testing: Validates improvements before rollout
- Workload-aware: Different configs for different workloads
Usage
use heliosdb_automl_tuning::{ AutoTuner, TunerConfig, SearchSpace, Parameter, ParameterType,};
let search_space = SearchSpace::new() .add(Parameter::new("shared_buffers", ParameterType::Memory) .range("256MB", "8GB")) .add(Parameter::new("work_mem", ParameterType::Memory) .range("4MB", "256MB")) .add(Parameter::new("max_connections", ParameterType::Integer) .range(100, 1000)) .add(Parameter::new("effective_cache_size", ParameterType::Memory) .range("1GB", "32GB"));
let config = TunerConfig { algorithm: TuningAlgorithm::BayesianOptimization, max_iterations: 50, target_metric: "throughput", constraints: vec![ Constraint::MaxLatency(100), // ms Constraint::MinThroughput(1000), // qps ],};
let tuner = AutoTuner::new(config, search_space);
// Run tuninglet result = tuner.tune(&workload_benchmark).await?;
println!("Optimal configuration found:");for (param, value) in &result.best_config { println!(" {} = {}", param, value);}println!("Improvement: {:.1}%", result.improvement_percent);
// Apply configuration (with rollback support)tuner.apply_config(&result.best_config, RollbackPolicy::OnRegression).await?;8. Auto-Index
Location: heliosdb-autonomous/crates/auto-index
Description
ML-based automatic index recommendation and management based on workload analysis.
Key Features
- Workload Analysis: Learns from query patterns
- Index Recommendations: Suggests optimal indexes
- Impact Prediction: Estimates performance improvement
- Automatic Creation: Creates indexes during low-traffic periods
- Index Consolidation: Removes redundant indexes
Usage
use heliosdb_auto_index::{ AutoIndexer, IndexerConfig, WorkloadSample,};
let config = IndexerConfig { analysis_window: Duration::from_hours(24), min_improvement_threshold: 0.10, // 10% improvement required max_indexes_per_table: 10, auto_create: true, maintenance_window: "02:00-05:00",};
let indexer = AutoIndexer::new(config);
// Analyze workloadindexer.record_query(&query, execution_stats).await?;
// Get recommendationslet recommendations = indexer.get_recommendations().await?;for rec in recommendations { println!("Table: {}", rec.table); println!("Suggested index: {}", rec.index_definition); println!("Estimated improvement: {:.1}%", rec.estimated_improvement * 100.0); println!("Affected queries: {}", rec.affected_query_count);}
// Apply recommendationsindexer.apply_recommendations(ApplyPolicy::RequireApproval).await?;
// Get index health reportlet health = indexer.get_index_health().await?;for (table, stats) in health.per_table { println!("{}: {} indexes, {:.1}% usage efficiency", table, stats.index_count, stats.usage_efficiency * 100.0);}9. Probabilistic Data Structures
Location: heliosdb-models/crates/probabilistic
Description
Memory-efficient probabilistic data structures for approximate queries.
Supported Structures
| Structure | Use Case | Space | Error Rate |
|---|---|---|---|
| Bloom Filter | Membership testing | O(n) bits | Configurable FP |
| Count-Min Sketch | Frequency estimation | O(1) | Configurable |
| HyperLogLog | Cardinality estimation | ~1.5KB | ~2% |
| T-Digest | Percentile estimation | O(compression) | ~1% |
| Cuckoo Filter | Membership + delete | O(n) bits | Configurable |
| MinHash | Similarity estimation | O(k) | 1/√k |
| SimHash | Near-duplicate detection | O(1) | Configurable |
Usage
use heliosdb_probabilistic::{ BloomFilter, CountMinSketch, HyperLogLog, TDigest, CuckooFilter, MinHash,};
// Bloom Filter - membership testinglet mut bloom = BloomFilter::new(1_000_000, 0.01); // 1M items, 1% FP ratebloom.insert(&"user123");assert!(bloom.contains(&"user123")); // True (definitely present or FP)assert!(!bloom.contains(&"user456")); // False (definitely not present)
// Count-Min Sketch - frequency estimationlet mut cms = CountMinSketch::new(0.01, 0.001); // 1% error, 99.9% confidencecms.add(&"query_type_a", 1);cms.add(&"query_type_a", 1);println!("Estimated count: {}", cms.estimate(&"query_type_a"));
// HyperLogLog - cardinality estimationlet mut hll = HyperLogLog::new(14); // 2^14 registersfor user_id in user_ids { hll.add(&user_id);}println!("Estimated unique users: {}", hll.cardinality());
// T-Digest - percentile estimationlet mut tdigest = TDigest::new(100); // compression factorfor latency in latencies { tdigest.add(latency);}println!("p99 latency: {:.2}ms", tdigest.percentile(0.99));
// MinHash - similarity estimationlet minhash1 = MinHash::new(128).add_all(&set1);let minhash2 = MinHash::new(128).add_all(&set2);println!("Jaccard similarity: {:.2}", minhash1.similarity(&minhash2));Integration with HeliosDB
Enabling Features
[dependencies]heliosdb = { version = "7.1", features = [ "neural-planner", "schema-ai", "rl-cache", "mab-balancer", "anomaly-detection", "forecasting", "automl-tuning", "auto-index", "probabilistic",]}Configuration
[ai.neural_planner]enabled = truemodel_path = "models/query_planner.onnx"inference_timeout_ms = 5fallback_to_traditional = true
[ai.schema_ai]enabled = truedefault_normalization = "3NF"
[cache.rl]enabled = truecapacity_mb = 2048learning_enabled = true
[cluster.mab_balancer]enabled = truealpha = 0.5update_frequency = 100
[monitoring.anomaly_detection]enabled = truealgorithm = "ensemble"alert_threshold = 0.8
[monitoring.forecasting]enabled = truealgorithm = "auto"horizon_hours = 24
[tuning.automl]enabled = truemaintenance_window = "02:00-05:00"require_approval = true
[indexing.auto_index]enabled = truemin_improvement_threshold = 0.10max_indexes_per_table = 10Best Practices
1. Start with Defaults
All packages have sensible defaults. Start with defaults and tune based on your workload.
2. Monitor Before Enabling
Monitor your workload characteristics before enabling AI features:
- Query patterns and frequency
- Data distribution
- Peak vs off-peak traffic
3. Use Gradual Rollout
Enable features gradually:
- Start with read-only features (anomaly detection, forecasting)
- Enable learning features (neural planner, RL cache)
- Enable write features (auto-index, automl tuning)
4. Set Safety Constraints
Always configure safety constraints for features that modify behavior:
- Max latency thresholds
- Rollback policies
- Approval requirements
5. Review Recommendations
AI recommendations should be reviewed before automatic application, especially for:
- Index creation
- Configuration changes
- Schema modifications
Troubleshooting
Common Issues
Neural Planner slow inference
- Check model file is loaded (not re-loading per query)
- Reduce beam width if latency exceeds 5ms
- Use ONNX runtime optimizations
RL Cache low hit rate
- Allow more training time (10,000+ accesses)
- Check exploration rate isn’t too high
- Verify feature extraction includes relevant context
Anomaly Detection false positives
- Increase contamination parameter
- Use ensemble mode for higher precision
- Train on longer historical period
AutoML Tuning not improving
- Expand search space ranges
- Increase iteration count
- Check constraint feasibility
Support
- Documentation: https://heliosdb.io/docs/ai-ml
- Issues: https://github.com/heliosdb/heliosdb/issues
- Community: https://discord.gg/heliosdb
This guide covers HeliosDB v7.1.2 Tier 2/3 AI/ML features.