Skip to content

Intelligent Materialized Views User Guide (F5.2.3)

Intelligent Materialized Views User Guide (F5.2.3)

Feature ID: F5.2.3 Tier: 2 ($14M ARR) Status: Production Ready Version: 5.2


Table of Contents

  1. Overview
  2. Key Features
  3. Architecture
  4. Getting Started
  5. Configuration
  6. Usage Examples
  7. Maintenance Strategies
  8. ML-Based Recommendations
  9. Performance Optimization
  10. Monitoring and Metrics
  11. Best Practices
  12. Troubleshooting
  13. API Reference

Overview

Intelligent Materialized Views is an ML-driven system that automatically analyzes query workloads, recommends optimal materialized views, and manages their lifecycle with smart refresh strategies.

What are Materialized Views?

Materialized views are precomputed query results stored as physical tables. They dramatically speed up complex queries by eliminating the need to recalculate aggregations, joins, and filters on every request.

Why Intelligent?

Unlike traditional materialized views that require manual creation and maintenance, HeliosDB’s intelligent system:

  • Automatically analyzes your query workload over a 7-day window
  • ML-predicts which queries benefit most from materialization
  • Cost-optimizes view selection to maximize ROI
  • Adaptively refreshes views based on data change patterns
  • Self-tunes refresh strategies for optimal freshness vs. cost

Key Features

1. ML-Driven View Recommendation

  • Pattern Detection: Identifies common query patterns using SQL normalization
  • Speedup Prediction: ML models predict 10-100x query acceleration
  • Confidence Scoring: 0.0-1.0 confidence for each recommendation
  • Benefit-Cost Analysis: Calculates storage, compute, and freshness costs

2. Automatic Workload Analysis

  • 7-Day Window: Analyzes recent query patterns
  • Frequency Tracking: Identifies high-frequency queries
  • Complexity Scoring: Detects expensive operations (JOINs, aggregations)
  • Table Dependency: Tracks which tables are accessed together

3. Smart Refresh Strategies

  • Immediate: Real-time updates on base table changes (OLTP)
  • Scheduled: Periodic refresh at configurable intervals (batch)
  • On-Demand: Refresh before query execution
  • ML-Predicted: Optimal refresh timing based on access patterns

4. Cost-Benefit Optimization

  • Storage Cost: Per-GB pricing with compression
  • Compute Cost: CPU and memory for refresh operations
  • Freshness Cost: Business impact of stale data
  • Network Cost: Data transfer for distributed views

5. Pattern Classification

  • Point Lookup: Fast key-based access (20% speedup)
  • Range Scan: Sequential scans with filters (50% speedup)
  • Aggregation: GROUP BY, COUNT, SUM, AVG (90% speedup)
  • Join Heavy: Multi-table joins (85% speedup)
  • Analytical: Complex analytical queries (95% speedup)
  • Real-Time: Low-latency requirements (40% speedup, high maintenance)
  • Batch: Periodic batch processing (80% speedup, low maintenance)

Architecture

┌───────────────────────────────────────────────────────────┐
│ Query Workload │
│ (User queries → Workload Analyzer → Pattern Mining) │
└────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│ ML Intelligence Layer │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Feature Extractor│─────▶│ ML Recommender │ │
│ │ - Complexity │ │ - Confidence scoring │ │
│ │ - JOINs/AGGs │ │ - Speedup prediction │ │
│ │ - Cardinality │ │ - Strategy selection │ │
│ └──────────────────┘ └──────────────────────────┘ │
└────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│ Cost-Benefit Analysis │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Advanced Cost │─────▶│ ROI Calculator │ │
│ │ Model │ │ - Net benefit │ │
│ │ - Storage │ │ - Sensitivity analysis │ │
│ │ - Compute │ └──────────────────────────┘ │
│ │ - Freshness │ │
│ └──────────────────┘ │
└────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│ View Lifecycle Management │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ View Registry │◀────▶│ Refresh Scheduler │ │
│ │ - Creation │ │ - Immediate │ │
│ │ - Deletion │ │ - Scheduled │ │
│ │ - Monitoring │ │ - On-demand │ │
│ └──────────────────┘ │ - ML-predicted │ │
│ └──────────────────────────┘ │
└───────────────────────────────────────────────────────────┘

Getting Started

Installation

The materialized views feature is included in HeliosDB v5.2+:

use heliosdb_materialized_views::{
MaterializedViewManager, MaterializedViewConfig, MaintenanceStrategy,
};

Quick Start Example

use heliosdb_materialized_views::{
MaterializedViewManager, MaterializedViewConfig, MaintenanceStrategy,
};
use chrono::Utc;
use uuid::Uuid;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Create manager with default config
let config = MaterializedViewConfig::default();
let mut manager = MaterializedViewManager::new(config);
// 2. Manually create a materialized view
let view = manager.create_view(
"user_status_summary".to_string(),
"SELECT status, COUNT(*) as count FROM users GROUP BY status".to_string(),
MaintenanceStrategy::Incremental,
).await?;
println!("Created view: {} (ID: {})", view.name, view.id);
// 3. Refresh the view
manager.refresh_view(view.id).await?;
// 4. List all views
let views = manager.list_views();
println!("Total views: {}", views.len());
Ok(())
}

Configuration

MaterializedViewConfig

pub struct MaterializedViewConfig {
/// Maximum number of materialized views
pub max_views: usize, // Default: 100
/// Maximum total storage for all views (MB)
pub max_total_storage_mb: u64, // Default: 10,000 (10GB)
/// Minimum query frequency to consider for view
pub min_query_frequency: u64, // Default: 10
/// Minimum benefit score for view creation
pub min_benefit_score: f64, // Default: 100.0
/// Minimum confidence for ML recommendations
pub min_confidence: f64, // Default: 0.7 (70%)
/// Enable genetic algorithm optimization
pub enable_genetic_algorithm: bool, // Default: false
/// Workload analysis window (days)
pub workload_window_days: i64, // Default: 7
/// Storage cost per GB per day ($)
pub storage_cost_per_gb_per_day: f64, // Default: 0.10
/// Compute cost per CPU second ($)
pub compute_cost_per_cpu_second: f64, // Default: 0.0001
}

Example Custom Configuration

let config = MaterializedViewConfig {
max_views: 50,
max_total_storage_mb: 5000,
min_query_frequency: 20,
min_benefit_score: 200.0,
min_confidence: 0.85,
enable_genetic_algorithm: true,
workload_window_days: 14,
storage_cost_per_gb_per_day: 0.15,
compute_cost_per_cpu_second: 0.0002,
};
let manager = MaterializedViewManager::new(config);

Usage Examples

Example 1: Automatic View Creation from Workload

use heliosdb_materialized_views::{MaterializedViewManager, MaterializedViewConfig, WorkloadQuery};
use chrono::Utc;
use uuid::Uuid;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = MaterializedViewConfig {
min_query_frequency: 5,
min_confidence: 0.75,
..Default::default()
};
let mut manager = MaterializedViewManager::new(config);
// Simulate workload (in production, this happens automatically)
for _ in 0..100 {
manager.record_query(WorkloadQuery {
id: Uuid::new_v4(),
query: "SELECT status, COUNT(*), AVG(age) FROM users GROUP BY status".to_string(),
execution_time_ms: 450.0,
timestamp: Utc::now(),
rows_returned: 5,
tables_accessed: vec!["users".to_string()],
});
}
// Analyze workload and auto-create views
let created_views = manager.auto_create_views().await?;
println!("Auto-created {} materialized views:", created_views.len());
for view in &created_views {
println!(" - {} (Strategy: {:?})", view.name, view.maintenance_strategy);
}
Ok(())
}

Example 2: Manual View Creation with Different Strategies

// Incremental refresh (real-time OLTP)
let oltp_view = manager.create_view(
"active_users".to_string(),
"SELECT * FROM users WHERE active = true".to_string(),
MaintenanceStrategy::Incremental,
).await?;
// Scheduled refresh (batch analytics)
let batch_view = manager.create_view(
"daily_sales".to_string(),
"SELECT DATE(created_at), SUM(amount) FROM orders GROUP BY DATE(created_at)".to_string(),
MaintenanceStrategy::Deferred {
refresh_interval_seconds: 3600, // Every hour
},
).await?;
// On-demand refresh (ad-hoc queries)
let adhoc_view = manager.create_view(
"recent_activity".to_string(),
"SELECT * FROM activity_log WHERE timestamp > NOW() - INTERVAL '1 hour'".to_string(),
MaintenanceStrategy::OnDemand,
).await?;
// Manual refresh (full control)
let manual_view = manager.create_view(
"monthly_report".to_string(),
"SELECT MONTH(created_at), COUNT(*) FROM transactions GROUP BY MONTH(created_at)".to_string(),
MaintenanceStrategy::Manual,
).await?;

Example 3: ML-Based View Recommendations

use heliosdb_materialized_views::ml::MLViewRecommender;
let recommender = MLViewRecommender::new()
.with_min_confidence(0.8)
.with_target_speedup(80.0); // Target 80%+ speedup
// Analyze workload patterns
let candidates = manager.analyze_and_generate_candidates().await?;
// Get ML recommendations
let patterns = /* extract patterns from workload */;
let recommendations = recommender.recommend(&patterns)?;
for rec in recommendations {
println!("Recommendation:");
println!(" Query: {}", rec.candidate.query);
println!(" Confidence: {:.1}%", rec.ml_confidence * 100.0);
println!(" Predicted speedup: {:.1}%", rec.predicted_speedup_pct);
println!(" Refresh strategy: {:?}", rec.candidate.recommended_strategy);
println!(" Reasons:");
for reason in &rec.reasons {
println!(" - {}", reason);
}
}

Example 4: Cost-Benefit Analysis

use heliosdb_materialized_views::candidate::{CostBenefitAnalyzer, AdvancedCostModel};
let cost_analyzer = CostBenefitAnalyzer::new()
.with_daily_queries(50000)
.with_storage_cost(0.12)
.with_compute_cost(0.00015);
let candidates = manager.analyze_and_generate_candidates().await?;
let analyses = cost_analyzer.analyze_batch(&candidates)?;
for analysis in analyses {
if analysis.should_create {
println!("Recommended view:");
println!(" Query: {}", analysis.candidate.query);
println!(" Query savings: {:.2} ms/day", analysis.query_savings_ms_per_day);
println!(" Maintenance cost: {:.2} ms/day", analysis.maintenance_cost_ms_per_day);
println!(" Net benefit: {:.2} ms/day", analysis.net_benefit);
println!(" ROI: {:.2}x", analysis.roi);
}
}
// Advanced cost model with detailed breakdown
let advanced_model = AdvancedCostModel::new()
.with_compression_ratio(0.3) // 70% compression
.with_replication_factor(3); // 3x replication
for candidate in &candidates {
let storage_cost = advanced_model.calculate_storage_cost(candidate)?;
let compute_cost = advanced_model.calculate_compute_cost(candidate)?;
let freshness_cost = advanced_model.calculate_freshness_cost(candidate)?;
let total_cost = advanced_model.calculate_total_cost(candidate)?;
let benefit = advanced_model.calculate_benefit(candidate)?;
let roi = advanced_model.calculate_roi(candidate)?;
println!("Detailed cost analysis:");
println!(" Storage: ${:.2}/day", storage_cost);
println!(" Compute: ${:.2}/day", compute_cost);
println!(" Freshness: ${:.2}/day", freshness_cost);
println!(" Total cost: ${:.2}/day", total_cost);
println!(" Benefit: ${:.2}/day", benefit);
println!(" ROI: {:.2}x", roi);
}

Example 5: Staleness Policy Configuration

use heliosdb_materialized_views::maintenance::{StalenessPolicy, ToleranceLevel};
use chrono::Duration;
// Global staleness policy
let mut policy = StalenessPolicy::new(ToleranceLevel::Medium)
.with_auto_adjustment(true)
.with_warning_percentage(10.0);
// Table-specific policies
policy = policy
.add_table_policy(
"users".to_string(),
Duration::seconds(30),
5, // Max 5% stale reads
)
.add_table_policy(
"orders".to_string(),
Duration::minutes(1),
10, // Max 10% stale reads
)
.add_table_policy(
"analytics".to_string(),
Duration::hours(1),
50, // Max 50% stale reads acceptable
);
// Check if staleness is acceptable
if policy.is_acceptable_staleness("users", Duration::seconds(20)) {
println!("User data staleness is acceptable");
}

Example 6: Intelligent Refresh Engine

use heliosdb_materialized_views::maintenance::IntelligentRefreshEngine;
let mut refresh_engine = IntelligentRefreshEngine::new(10000);
// The engine automatically selects the best strategy based on:
// - Change ratio (% of rows modified)
// - View size
// - Access patterns
// - Historical performance
let view = /* get view */;
// Automatic intelligent refresh
refresh_engine.intelligent_refresh(&view).await?;
// Manual strategy selection
let strategy = refresh_engine.select_strategy(
0.15, // 15% change ratio
1_000_000, // 1M rows
);
println!("Selected refresh strategy: {:?}", strategy);

Maintenance Strategies

1. Incremental Refresh

Best for: OLTP workloads, real-time dashboards

MaintenanceStrategy::Incremental

Behavior:

  • Tracks changes to base tables
  • Applies only delta updates to the view
  • Near real-time freshness
  • Low refresh latency

Example use case: Active user list, real-time inventory counts

2. Deferred/Scheduled Refresh

Best for: Batch analytics, reporting

MaintenanceStrategy::Deferred {
refresh_interval_seconds: 3600, // 1 hour
}

Behavior:

  • Periodic full or incremental refresh
  • Configurable interval
  • Lower maintenance overhead
  • Controlled refresh windows

Example use case: Daily sales reports, hourly metrics

3. On-Demand Refresh

Best for: Ad-hoc queries, low-frequency access

MaintenanceStrategy::OnDemand

Behavior:

  • Refreshed just before query execution
  • No ongoing maintenance cost
  • Higher query latency (first access)

Example use case: Executive reports, monthly summaries

4. Manual Refresh

Best for: Full control, custom workflows

MaintenanceStrategy::Manual

Behavior:

  • Application controls refresh timing
  • No automatic refresh
  • Explicit refresh_view() calls

Example use case: Data warehouse loads, controlled ETL


ML-Based Recommendations

How It Works

  1. Feature Extraction: Analyzes query structure

    • Table count
    • JOIN count
    • Aggregation presence
    • Subquery complexity
    • Window functions
    • Cardinality estimates
  2. Pattern Classification: Categorizes queries

    • Point Lookup → Low benefit
    • Aggregation → High benefit (90% speedup)
    • Join Heavy → High benefit (85% speedup)
  3. Confidence Scoring: ML model predicts success

    • 0.0-1.0 score
    • Based on frequency, complexity, execution time
    • Trained on historical performance
  4. Speedup Prediction: Estimates query acceleration

    • Point Lookup: 20% speedup
    • Range Scan: 50% speedup
    • Aggregation: 90% speedup
    • Join Heavy: 85% speedup
    • Analytical: 95% speedup

Accuracy Metrics

  • Recommendation Accuracy: >85% (target), 92% (achieved)
  • Speedup Prediction Error: <15%
  • False Positive Rate: <10%

Performance Optimization

Performance Targets

MetricTargetAchieved
Query Speedup (aggregations)80%+85-95%
Query Speedup (joins)70%+75-90%
Refresh Time (100M rows)<5 min~4 min
View Recommendation Accuracy>85%92%
Refresh Overhead<5%3-4%
Storage Overhead<20%12-18%

Optimization Tips

  1. Set appropriate min_query_frequency: Higher values reduce false positives
  2. Use compression: Enable to reduce storage costs
  3. Tune refresh intervals: Balance freshness vs. cost
  4. Monitor ROI: Drop low-ROI views
  5. Use staleness policies: Relax constraints where acceptable

Monitoring and Metrics

Prometheus Metrics

# View recommendation accuracy
mv_recommendation_accuracy{view_id="uuid"} 0.92
# Query speedup achieved
mv_speedup_achieved{view_id="uuid",query_pattern="aggregation"} 87.5
# Refresh duration
mv_refresh_time_ms{view_id="uuid",strategy="incremental"} 2340
# Staleness percentage
mv_staleness_percentage{view_id="uuid"} 2.5
# ROI ratio
mv_roi_ratio{view_id="uuid"} 12.3

Statistics API

let stats = manager.get_statistics();
println!("Total views: {}", stats.total_views);
println!("Total storage: {} bytes", stats.total_storage_bytes);
println!("Workload queries: {}", stats.workload_query_count);
// Per-view statistics
let view = manager.get_view(view_id)?;
println!("Refresh count: {}", view.statistics.refresh_count);
println!("Last refresh: {:?}", view.last_refreshed_at);
println!("Size: {} bytes", view.size_bytes);

Best Practices

1. Start Conservative

let config = MaterializedViewConfig {
min_query_frequency: 20, // Higher threshold
min_confidence: 0.85, // Higher confidence
min_benefit_score: 200.0, // Higher benefit
..Default::default()
};

2. Monitor and Adjust

  • Review ROI weekly
  • Drop underperforming views
  • Adjust frequency thresholds
  • Tune refresh intervals

3. Use Appropriate Strategies

  • Real-time data → Incremental
  • Batch reports → Deferred
  • Ad-hoc queries → On-Demand
  • Data warehouse → Manual

4. Set Staleness Policies

// Critical data: tight constraints
policy.add_table_policy("transactions", Duration::seconds(10), 1);
// Analytics: relaxed constraints
policy.add_table_policy("analytics", Duration::hours(6), 50);

5. Regular Garbage Collection

// Drop views with:
// - Access count < 10
// - Last access > 30 days
let dropped = manager.garbage_collect().await?;
println!("Dropped {} unused views", dropped.len());

Troubleshooting

Possible causes:

  • min_query_frequency too high
  • min_confidence too high
  • Workload too diverse (no patterns)

Solution:

let config = MaterializedViewConfig {
min_query_frequency: 5, // Lower threshold
min_confidence: 0.7, // Lower confidence
..Default::default()
};

Issue: High storage usage

Solutions:

  • Enable compression
  • Drop low-ROI views
  • Reduce max_views
  • Set max_total_storage_mb

Issue: Stale data

Solutions:

  • Switch to Incremental strategy
  • Reduce refresh_interval_seconds
  • Tighten staleness policy
  • Monitor staleness metrics

Issue: Slow refreshes

Solutions:

  • Use incremental refresh
  • Add indexes on base tables
  • Partition large views
  • Increase refresh workers

API Reference

Core Types

pub struct MaterializedView {
pub id: ViewId,
pub name: String,
pub query: String,
pub maintenance_strategy: MaintenanceStrategy,
pub statistics: ViewStatistics,
pub size_bytes: u64,
pub enabled: bool,
pub created_at: DateTime<Utc>,
pub last_refreshed_at: Option<DateTime<Utc>>,
}
pub enum MaintenanceStrategy {
Incremental,
Deferred { refresh_interval_seconds: u64 },
OnDemand,
Manual,
}
pub struct WorkloadQuery {
pub id: Uuid,
pub query: String,
pub execution_time_ms: f64,
pub timestamp: DateTime<Utc>,
pub rows_returned: u64,
pub tables_accessed: Vec<String>,
}

Main Methods

impl MaterializedViewManager {
pub fn new(config: MaterializedViewConfig) -> Self;
pub fn record_query(&mut self, query: WorkloadQuery);
pub async fn analyze_and_generate_candidates(&self)
-> Result<Vec<ViewCandidate>>;
pub fn analyze_candidates(&self, candidates: &[ViewCandidate])
-> Result<Vec<CostBenefitAnalysis>>;
pub async fn select_optimal_views(&self, candidates: Vec<ViewCandidate>)
-> Result<Vec<CostBenefitAnalysis>>;
pub async fn auto_create_views(&mut self)
-> Result<Vec<MaterializedView>>;
pub async fn create_view(
&self,
name: String,
query: String,
strategy: MaintenanceStrategy,
) -> Result<MaterializedView>;
pub async fn drop_view(&self, view_id: ViewId) -> Result<()>;
pub async fn refresh_view(&self, view_id: ViewId) -> Result<()>;
pub fn list_views(&self) -> Vec<MaterializedView>;
pub fn get_view(&self, view_id: ViewId) -> Result<MaterializedView>;
pub fn rewrite_query(&self, query: &WorkloadQuery)
-> Result<Option<RewrittenQuery>>;
pub async fn garbage_collect(&self) -> Result<Vec<ViewId>>;
pub fn get_statistics(&self) -> ManagerStatistics;
}

Conclusion

Intelligent Materialized Views provides a powerful, ML-driven approach to query optimization with minimal operational overhead. By automatically analyzing workloads, recommending optimal views, and managing their lifecycle, it delivers 10-100x query speedups while maintaining <5% maintenance overhead.

For additional support or questions, consult:

  • Architecture docs: /docs/architecture/MILESTONE2_ARCHITECTURE_SUMMARY.md
  • Implementation summary: /heliosdb-materialized-views/INTELLIGENT_MV_IMPLEMENTATION.md
  • Test examples: /heliosdb-materialized-views/tests/

Document Version: 1.0 Last Updated: November 2, 2025 Author: HeliosDB Engineering Team