Skip to content

Materialized Views Implementation Guide

Materialized Views Implementation Guide

Overview

HeliosDB’s materialized views provide automatic query optimization through pre-computed result caching with intelligent refresh strategies and transparent query rewriting.

Features

1. Incremental Refresh

Efficiently update materialized views by processing only changes (deltas) rather than full recomputation.

use heliosdb_compute::materialized_views::*;
// Create incremental refresh engine
let engine = IncrementalRefreshEngine::new();
// Record changes to source tables
let change = incremental::RowChange::insert(
table_id,
key,
value,
transaction_id
);
engine.record_change(change).await?;
// Compute delta
let delta = engine.compute_delta(view_id, &source_tables, view_size).await?;
// Apply delta to view
if let Some(delta) = delta {
engine.apply_delta(&delta).await?;
}

Benefits:

  • Reduces refresh time by orders of magnitude
  • Minimizes resource consumption
  • Supports inserts, updates, and deletes
  • Automatic change coalescing

2. Flexible Refresh Strategies

Manual Refresh

let view = MaterializedView::new(
"my_view".to_string(),
"SELECT * FROM orders WHERE status = 'completed'".to_string(),
RefreshStrategy::Manual,
);
manager.create_view(view).await?;
manager.refresh_view(view_id).await?; // Explicit refresh

On-Commit Refresh

let view = MaterializedView::new(
"realtime_view".to_string(),
"SELECT user_id, COUNT(*) FROM events GROUP BY user_id".to_string(),
RefreshStrategy::OnCommit,
);

Interval-Based Refresh

let schedule = RefreshSchedule {
schedule_type: ScheduleType::Interval { seconds: 300 }, // Every 5 minutes
enabled: true,
start_time: None,
end_time: None,
};
let view = MaterializedView::new(
"periodic_view".to_string(),
"SELECT * FROM analytics_summary".to_string(),
RefreshStrategy::Scheduled(schedule),
);

Cron-Based Refresh

let schedule = RefreshSchedule {
schedule_type: ScheduleType::Cron {
expression: "0 */6 * * *".to_string(), // Every 6 hours
},
enabled: true,
start_time: None,
end_time: None,
};
let view = MaterializedView::new(
"scheduled_view".to_string(),
"SELECT DATE(created_at), COUNT(*) FROM orders GROUP BY DATE(created_at)".to_string(),
RefreshStrategy::Scheduled(schedule),
);

Cron Expression Examples:

  • "0 * * * *" - Every hour at minute 0
  • "0 0 * * *" - Daily at midnight
  • "0 0 * * 0" - Weekly on Sunday at midnight
  • "*/15 * * * *" - Every 15 minutes
  • "0 9-17 * * 1-5" - Weekdays 9 AM to 5 PM

3. Intelligent Query Rewriting

The query rewriter automatically uses materialized views to accelerate queries.

let rewriter = QueryRewriter::new();
// Register materialized views
rewriter.register_view(view).await?;
// Attempt to rewrite a query
let query = Query {
sql: "SELECT * FROM users WHERE active = true".to_string(),
components: QueryComponents { /* ... */ },
};
let result = rewriter.rewrite_query(&query).await?;
if result.success {
println!("Query rewritten! Cost savings: {:.2}%",
(result.cost_savings / result.original_cost) * 100.0);
println!("Rewritten query: {}", result.rewritten_query.unwrap());
}

Matching Strategies

The rewriter uses multiple strategies to find optimal views:

  1. Exact Match - Query exactly matches view definition
  2. Join Elimination - View pre-computes expensive joins
  3. Aggregation Pushdown - View pre-computes aggregations
  4. Superset Match - View contains all query data plus more
  5. Filter Pushdown - View has beneficial filters
  6. Column Projection - Query needs subset of view columns
  7. Custom Rules - User-defined pattern matching

Example: Join Elimination

// Original query requires join
let query = "SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'";
// Materialized view pre-computes the join
let view_query = "SELECT u.name, o.total, o.status
FROM users u
JOIN orders o ON u.id = o.user_id";
// Rewriter transforms to:
// "SELECT name, total FROM mv_user_orders WHERE status = 'completed'"

Example: Aggregation Pushdown

// Original query computes aggregation
let query = "SELECT user_id, COUNT(*), SUM(amount)
FROM orders
GROUP BY user_id";
// Materialized view pre-computes aggregations
let view_query = "SELECT user_id, COUNT(*) as order_count, SUM(amount) as total_amount
FROM orders
GROUP BY user_id";
// Rewriter uses view directly
// "SELECT user_id, order_count, total_amount FROM mv_user_summary"

4. Superset Matching

Comprehensive logic to determine if a view can answer a query:

// View: All user data
let view_query = "SELECT id, name, email, created_at, status FROM users";
// Query: Subset of columns with filter
let query = "SELECT id, name FROM users WHERE status = 'active'";
// Rewriter recognizes superset match and rewrites to:
// "SELECT id, name FROM mv_all_users WHERE status = 'active'"

Validation Steps:

  1. ✓ Table coverage - View includes all query tables
  2. ✓ Column coverage - View provides all needed columns
  3. ✓ Join compatibility - View joins match query needs
  4. ✓ Filter compatibility - View doesn’t exclude query data
  5. ✓ Grouping compatibility - View grouping matches query

5. Storage Backends

Materialized views support multiple storage backends:

let view = MaterializedView::new(name, query, strategy)
.with_storage_backend(StorageBackend::Lsm); // LSM tree (default)
// .with_storage_backend(StorageBackend::Memory); // In-memory (fast, volatile)
// .with_storage_backend(StorageBackend::Columnar); // Columnar (analytics)
// .with_storage_backend(StorageBackend::Hybrid); // Hot/cold tiering

6. Statistics and Monitoring

// View statistics
let stats = view.statistics;
println!("Row count: {}", stats.row_count);
println!("Size: {} bytes", stats.size_bytes);
println!("Last refresh: {}", stats.last_refresh_time);
println!("Avg refresh duration: {} ms", stats.avg_refresh_duration_ms);
println!("Query hits: {}", stats.query_hit_count);
// Refresh manager statistics
let refresh_stats = refresh_manager.get_statistics(view_id).await?;
// Rewriter statistics
let rewrite_stats = rewriter.get_statistics().await;
println!("Total queries: {}", rewrite_stats.total_queries);
println!("Successful rewrites: {}", rewrite_stats.successful_rewrites);
println!("Success rate: {:.2}%", rewrite_stats.success_rate);
println!("Total cost savings: {}", rewrite_stats.total_cost_savings);

Complete Example

use heliosdb_compute::materialized_views::*;
#[tokio::main]
async fn main() -> Result<()> {
// Create manager
let manager = MaterializedViewManager::new();
// Create materialized view with cron refresh
let schedule = RefreshSchedule {
schedule_type: ScheduleType::Cron {
expression: "0 */4 * * *".to_string(), // Every 4 hours
},
enabled: true,
start_time: None,
end_time: None,
};
let mut view = MaterializedView::new(
"user_order_summary".to_string(),
r#"
SELECT
u.id as user_id,
u.name,
COUNT(o.id) as order_count,
SUM(o.total) as total_spent,
MAX(o.created_at) as last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
"#.to_string(),
RefreshStrategy::Scheduled(schedule),
);
view.add_source_table(1); // users table
view.add_source_table(2); // orders table
// Create view
let view_id = manager.create_view(view).await?;
println!("Created view with ID: {}", view_id);
// Get query rewriter
let rewriter = manager.query_rewriter();
// Try to rewrite a query
let query = Query {
sql: "SELECT name, order_count FROM user_order_summary WHERE order_count > 10".to_string(),
components: QueryComponents {
select: vec!["name".to_string(), "order_count".to_string()],
from: vec!["user_order_summary".to_string()],
where_clause: Some("order_count > 10".to_string()),
joins: Vec::new(),
group_by: Vec::new(),
having: None,
order_by: Vec::new(),
limit: None,
offset: None,
},
};
let result = rewriter.rewrite_query(&query).await?;
if result.success {
println!("Query optimized!");
println!("Original cost: {}", result.original_cost);
println!("Rewritten cost: {}", result.rewritten_cost);
println!("Savings: {:.2}%", (result.cost_savings / result.original_cost) * 100.0);
}
Ok(())
}

Performance Considerations

When to Use Materialized Views

Good candidates:

  • Frequently executed expensive queries
  • Queries with complex joins
  • Aggregations over large datasets
  • Queries with stable predicates
  • OLAP/analytics workloads

Poor candidates:

  • Highly volatile data
  • One-time queries
  • Simple primary key lookups
  • Real-time data requirements (use on-commit refresh instead)

Refresh Strategy Selection

Data Change RateRecommended StrategyNotes
MillisecondsOnCommitReal-time requirements
SecondsInterval (short)High freshness, higher cost
MinutesInterval or CronBalanced approach
HoursCronStandard for daily patterns
DaysCronLow-frequency updates
On-demandManualUser-triggered

Configuration Tuning

// Incremental refresh config
let config = IncrementalConfig {
max_batch_size: 10000,
enable_compression: true,
enable_coalescing: true,
full_refresh_threshold_percent: 25.0, // Switch to full if >25% changes
};
// Refresh manager config
let refresh_config = RefreshConfig {
max_concurrent_refreshes: 4,
max_retries: 3,
retry_delay_seconds: 60,
task_timeout_seconds: 3600,
enable_throttling: true,
min_refresh_interval_seconds: 10,
};
// Rewriter config
let rewriter_config = RewriterConfig {
enable_rewriting: true,
min_cost_savings_percent: 10.0, // Only rewrite if >10% savings
max_views_to_consider: 20,
enable_partial_matching: true,
enable_view_composition: false,
};

Best Practices

  1. Start with Manual Refresh - Test view creation and queries before enabling automatic refresh
  2. Monitor Statistics - Track refresh duration and query hit rates
  3. Use Incremental Refresh - Significantly faster for views with many source table updates
  4. Choose Appropriate Storage - Use columnar for analytics, LSM for general purpose
  5. Set Realistic Schedules - Match refresh frequency to data volatility
  6. Create Indexes on Views - Improve query performance on materialized results
  7. Monitor Cost Savings - Verify views are being used and providing value
  8. Clean Up Unused Views - Drop views with low query hit rates

Troubleshooting

View Not Being Used

Check if rewriter is enabled and view is fresh:

let view = manager.get_view(view_id).await?;
println!("State: {:?}", view.state);
println!("Is fresh: {}", view.is_fresh());
let stats = rewriter.get_statistics().await;
println!("Total queries: {}", stats.total_queries);
println!("No match count: {}", stats.no_match_count);

Refresh Failures

Check failure count and error messages:

if view.statistics.failure_count > 0 {
println!("Failures: {}", view.statistics.failure_count);
// Check logs for error details
}

High Refresh Duration

Consider switching to incremental refresh or optimizing view query:

if view.statistics.avg_refresh_duration_ms > 60000 {
println!("Average refresh taking > 1 minute, consider optimization");
// Review view query complexity
// Check if incremental refresh is enabled
}

API Reference

See module documentation:

  • heliosdb_compute::materialized_views - Main module
  • heliosdb_compute::materialized_views::incremental - Incremental refresh
  • heliosdb_compute::materialized_views::refresh - Refresh management
  • heliosdb_compute::materialized_views::rewrite - Query rewriting
  • heliosdb_compute::materialized_views::dependencies - Dependency tracking

See Also