Materialized Views Implementation Guide
Materialized Views Implementation Guide
Overview
HeliosDB’s materialized views provide automatic query optimization through pre-computed result caching with intelligent refresh strategies and transparent query rewriting.
Features
1. Incremental Refresh
Efficiently update materialized views by processing only changes (deltas) rather than full recomputation.
use heliosdb_compute::materialized_views::*;
// Create incremental refresh enginelet engine = IncrementalRefreshEngine::new();
// Record changes to source tableslet change = incremental::RowChange::insert( table_id, key, value, transaction_id);engine.record_change(change).await?;
// Compute deltalet delta = engine.compute_delta(view_id, &source_tables, view_size).await?;
// Apply delta to viewif let Some(delta) = delta { engine.apply_delta(&delta).await?;}Benefits:
- Reduces refresh time by orders of magnitude
- Minimizes resource consumption
- Supports inserts, updates, and deletes
- Automatic change coalescing
2. Flexible Refresh Strategies
Manual Refresh
let view = MaterializedView::new( "my_view".to_string(), "SELECT * FROM orders WHERE status = 'completed'".to_string(), RefreshStrategy::Manual,);
manager.create_view(view).await?;manager.refresh_view(view_id).await?; // Explicit refreshOn-Commit Refresh
let view = MaterializedView::new( "realtime_view".to_string(), "SELECT user_id, COUNT(*) FROM events GROUP BY user_id".to_string(), RefreshStrategy::OnCommit,);Interval-Based Refresh
let schedule = RefreshSchedule { schedule_type: ScheduleType::Interval { seconds: 300 }, // Every 5 minutes enabled: true, start_time: None, end_time: None,};
let view = MaterializedView::new( "periodic_view".to_string(), "SELECT * FROM analytics_summary".to_string(), RefreshStrategy::Scheduled(schedule),);Cron-Based Refresh
let schedule = RefreshSchedule { schedule_type: ScheduleType::Cron { expression: "0 */6 * * *".to_string(), // Every 6 hours }, enabled: true, start_time: None, end_time: None,};
let view = MaterializedView::new( "scheduled_view".to_string(), "SELECT DATE(created_at), COUNT(*) FROM orders GROUP BY DATE(created_at)".to_string(), RefreshStrategy::Scheduled(schedule),);Cron Expression Examples:
"0 * * * *"- Every hour at minute 0"0 0 * * *"- Daily at midnight"0 0 * * 0"- Weekly on Sunday at midnight"*/15 * * * *"- Every 15 minutes"0 9-17 * * 1-5"- Weekdays 9 AM to 5 PM
3. Intelligent Query Rewriting
The query rewriter automatically uses materialized views to accelerate queries.
let rewriter = QueryRewriter::new();
// Register materialized viewsrewriter.register_view(view).await?;
// Attempt to rewrite a querylet query = Query { sql: "SELECT * FROM users WHERE active = true".to_string(), components: QueryComponents { /* ... */ },};
let result = rewriter.rewrite_query(&query).await?;
if result.success { println!("Query rewritten! Cost savings: {:.2}%", (result.cost_savings / result.original_cost) * 100.0); println!("Rewritten query: {}", result.rewritten_query.unwrap());}Matching Strategies
The rewriter uses multiple strategies to find optimal views:
- Exact Match - Query exactly matches view definition
- Join Elimination - View pre-computes expensive joins
- Aggregation Pushdown - View pre-computes aggregations
- Superset Match - View contains all query data plus more
- Filter Pushdown - View has beneficial filters
- Column Projection - Query needs subset of view columns
- Custom Rules - User-defined pattern matching
Example: Join Elimination
// Original query requires joinlet query = "SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.status = 'completed'";
// Materialized view pre-computes the joinlet view_query = "SELECT u.name, o.total, o.status FROM users u JOIN orders o ON u.id = o.user_id";
// Rewriter transforms to:// "SELECT name, total FROM mv_user_orders WHERE status = 'completed'"Example: Aggregation Pushdown
// Original query computes aggregationlet query = "SELECT user_id, COUNT(*), SUM(amount) FROM orders GROUP BY user_id";
// Materialized view pre-computes aggregationslet view_query = "SELECT user_id, COUNT(*) as order_count, SUM(amount) as total_amount FROM orders GROUP BY user_id";
// Rewriter uses view directly// "SELECT user_id, order_count, total_amount FROM mv_user_summary"4. Superset Matching
Comprehensive logic to determine if a view can answer a query:
// View: All user datalet view_query = "SELECT id, name, email, created_at, status FROM users";
// Query: Subset of columns with filterlet query = "SELECT id, name FROM users WHERE status = 'active'";
// Rewriter recognizes superset match and rewrites to:// "SELECT id, name FROM mv_all_users WHERE status = 'active'"Validation Steps:
- ✓ Table coverage - View includes all query tables
- ✓ Column coverage - View provides all needed columns
- ✓ Join compatibility - View joins match query needs
- ✓ Filter compatibility - View doesn’t exclude query data
- ✓ Grouping compatibility - View grouping matches query
5. Storage Backends
Materialized views support multiple storage backends:
let view = MaterializedView::new(name, query, strategy) .with_storage_backend(StorageBackend::Lsm); // LSM tree (default) // .with_storage_backend(StorageBackend::Memory); // In-memory (fast, volatile) // .with_storage_backend(StorageBackend::Columnar); // Columnar (analytics) // .with_storage_backend(StorageBackend::Hybrid); // Hot/cold tiering6. Statistics and Monitoring
// View statisticslet stats = view.statistics;println!("Row count: {}", stats.row_count);println!("Size: {} bytes", stats.size_bytes);println!("Last refresh: {}", stats.last_refresh_time);println!("Avg refresh duration: {} ms", stats.avg_refresh_duration_ms);println!("Query hits: {}", stats.query_hit_count);
// Refresh manager statisticslet refresh_stats = refresh_manager.get_statistics(view_id).await?;
// Rewriter statisticslet rewrite_stats = rewriter.get_statistics().await;println!("Total queries: {}", rewrite_stats.total_queries);println!("Successful rewrites: {}", rewrite_stats.successful_rewrites);println!("Success rate: {:.2}%", rewrite_stats.success_rate);println!("Total cost savings: {}", rewrite_stats.total_cost_savings);Complete Example
use heliosdb_compute::materialized_views::*;
#[tokio::main]async fn main() -> Result<()> { // Create manager let manager = MaterializedViewManager::new();
// Create materialized view with cron refresh let schedule = RefreshSchedule { schedule_type: ScheduleType::Cron { expression: "0 */4 * * *".to_string(), // Every 4 hours }, enabled: true, start_time: None, end_time: None, };
let mut view = MaterializedView::new( "user_order_summary".to_string(), r#" SELECT u.id as user_id, u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent, MAX(o.created_at) as last_order_date FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name "#.to_string(), RefreshStrategy::Scheduled(schedule), );
view.add_source_table(1); // users table view.add_source_table(2); // orders table
// Create view let view_id = manager.create_view(view).await?; println!("Created view with ID: {}", view_id);
// Get query rewriter let rewriter = manager.query_rewriter();
// Try to rewrite a query let query = Query { sql: "SELECT name, order_count FROM user_order_summary WHERE order_count > 10".to_string(), components: QueryComponents { select: vec!["name".to_string(), "order_count".to_string()], from: vec!["user_order_summary".to_string()], where_clause: Some("order_count > 10".to_string()), joins: Vec::new(), group_by: Vec::new(), having: None, order_by: Vec::new(), limit: None, offset: None, }, };
let result = rewriter.rewrite_query(&query).await?;
if result.success { println!("Query optimized!"); println!("Original cost: {}", result.original_cost); println!("Rewritten cost: {}", result.rewritten_cost); println!("Savings: {:.2}%", (result.cost_savings / result.original_cost) * 100.0); }
Ok(())}Performance Considerations
When to Use Materialized Views
Good candidates:
- Frequently executed expensive queries
- Queries with complex joins
- Aggregations over large datasets
- Queries with stable predicates
- OLAP/analytics workloads
❌ Poor candidates:
- Highly volatile data
- One-time queries
- Simple primary key lookups
- Real-time data requirements (use on-commit refresh instead)
Refresh Strategy Selection
| Data Change Rate | Recommended Strategy | Notes |
|---|---|---|
| Milliseconds | OnCommit | Real-time requirements |
| Seconds | Interval (short) | High freshness, higher cost |
| Minutes | Interval or Cron | Balanced approach |
| Hours | Cron | Standard for daily patterns |
| Days | Cron | Low-frequency updates |
| On-demand | Manual | User-triggered |
Configuration Tuning
// Incremental refresh configlet config = IncrementalConfig { max_batch_size: 10000, enable_compression: true, enable_coalescing: true, full_refresh_threshold_percent: 25.0, // Switch to full if >25% changes};
// Refresh manager configlet refresh_config = RefreshConfig { max_concurrent_refreshes: 4, max_retries: 3, retry_delay_seconds: 60, task_timeout_seconds: 3600, enable_throttling: true, min_refresh_interval_seconds: 10,};
// Rewriter configlet rewriter_config = RewriterConfig { enable_rewriting: true, min_cost_savings_percent: 10.0, // Only rewrite if >10% savings max_views_to_consider: 20, enable_partial_matching: true, enable_view_composition: false,};Best Practices
- Start with Manual Refresh - Test view creation and queries before enabling automatic refresh
- Monitor Statistics - Track refresh duration and query hit rates
- Use Incremental Refresh - Significantly faster for views with many source table updates
- Choose Appropriate Storage - Use columnar for analytics, LSM for general purpose
- Set Realistic Schedules - Match refresh frequency to data volatility
- Create Indexes on Views - Improve query performance on materialized results
- Monitor Cost Savings - Verify views are being used and providing value
- Clean Up Unused Views - Drop views with low query hit rates
Troubleshooting
View Not Being Used
Check if rewriter is enabled and view is fresh:
let view = manager.get_view(view_id).await?;println!("State: {:?}", view.state);println!("Is fresh: {}", view.is_fresh());
let stats = rewriter.get_statistics().await;println!("Total queries: {}", stats.total_queries);println!("No match count: {}", stats.no_match_count);Refresh Failures
Check failure count and error messages:
if view.statistics.failure_count > 0 { println!("Failures: {}", view.statistics.failure_count); // Check logs for error details}High Refresh Duration
Consider switching to incremental refresh or optimizing view query:
if view.statistics.avg_refresh_duration_ms > 60000 { println!("Average refresh taking > 1 minute, consider optimization"); // Review view query complexity // Check if incremental refresh is enabled}API Reference
See module documentation:
heliosdb_compute::materialized_views- Main moduleheliosdb_compute::materialized_views::incremental- Incremental refreshheliosdb_compute::materialized_views::refresh- Refresh managementheliosdb_compute::materialized_views::rewrite- Query rewritingheliosdb_compute::materialized_views::dependencies- Dependency tracking