Materialized View Auto-Refresh Integration Guide
Materialized View Auto-Refresh Integration Guide
Version: 3.4.0 Last Updated: 2025-12-25
Overview
HeliosDB Nano Phase 3.4 integrates the auto-refresh worker with the materialized view system to provide end-to-end automatic refresh functionality. This document describes the complete architecture, usage, and monitoring capabilities.
New in v3.4.0
- ALTER MATERIALIZED VIEW SET: Configure MV options after creation
- REFRESH INCREMENTALLY: Explicit incremental refresh keyword
- CPU-aware scheduling: Integrated with AutoRefreshWorker
- Priority-based queue: Critical MVs refreshed first
Architecture
Components
The auto-refresh system consists of several integrated components:
- AutoRefreshWorker: Background task that monitors MV staleness
- MVScheduler: CPU-aware scheduler for prioritized refresh execution
- IncrementalRefresher: Efficient delta-based refresh operations
- DeltaTracker: Change tracking for incremental updates
- MvSystemViews: PostgreSQL-compatible monitoring functions
Data Flow
Base Table Modifications ↓ DeltaTracker ↓AutoRefreshWorker (staleness check) ↓ MVScheduler (priority queue) ↓IncrementalRefresher (refresh execution) ↓ Updated Materialized ViewUsage
Creating Auto-Refresh Enabled MVs
-- Create MV with auto-refreshCREATE MATERIALIZED VIEW sales_summary ASSELECT product_id, DATE_TRUNC('day', sale_date) as sale_day, SUM(amount) as total_sales, COUNT(*) as transaction_countFROM salesGROUP BY product_id, sale_dayWITH ( auto_refresh = true);Configuration
Global Configuration
use heliosdb_nano::storage::{AutoRefreshConfig, AutoRefreshWorker};
let config = AutoRefreshConfig::default() .with_enabled(true) .with_interval_seconds(60) // Check every 60 seconds .with_staleness_threshold(300) // Refresh if stale > 5 minutes .with_max_concurrent(2) // Max 2 concurrent refreshes .with_max_cpu_percent(50.0); // Throttle if CPU > 50%
let worker = AutoRefreshWorker::new(config, storage, scheduler);worker.start().await?;Per-MV Configuration
use heliosdb_nano::storage::{AutoRefreshPolicy, Priority};
let policy = AutoRefreshPolicy { enabled: true, refresh_interval_seconds: Some(300), // Override global threshold priority: Priority::High, // High priority refresh concurrent: true, // Use CONCURRENT refresh};
worker.register_mv("sales_summary", policy)?;Runtime Configuration Updates
// Update global configurationlet new_config = AutoRefreshConfig::default() .with_enabled(true) .with_staleness_threshold(600);
worker.update_config(new_config);
// Update per-MV policylet updated_policy = AutoRefreshPolicy { enabled: true, refresh_interval_seconds: Some(900), priority: Priority::Normal, concurrent: true,};
worker.update_policy("sales_summary", updated_policy)?;Monitoring
System Views
pg_mv_auto_refresh_status()
Returns current status of all materialized views with auto-refresh configuration.
SELECT * FROM pg_mv_auto_refresh_status();Columns:
mv_name: Materialized view nameauto_refresh_enabled: Whether auto-refresh is enabledlast_refresh: Timestamp of last refresh (ISO 8601)staleness_seconds: Seconds since last refreshthreshold_seconds: Configured staleness thresholdis_refreshing: Whether currently being refreshedrefresh_strategy: Strategy (manual, auto, incremental)row_count: Number of rows in the MVbase_table_count: Number of dependent base tables
Example Output:
mv_name | auto_refresh_enabled | last_refresh | staleness_seconds | threshold_seconds | is_refreshing | refresh_strategy | row_count | base_table_count----------------|----------------------|---------------------|-------------------|-------------------|---------------|------------------|-----------|------------------sales_summary | true | 2025-11-24 10:30:00 | 120 | 300 | false | incremental | 15000 | 1user_stats | true | 2025-11-24 10:25:00 | 420 | 300 | true | auto | 5000 | 2pg_mv_refresh_history(limit?)
Returns historical refresh operations.
-- Get last 50 refreshesSELECT * FROM pg_mv_refresh_history(50);Columns:
mv_name: Materialized view namestart_time: Refresh start timestampend_time: Refresh completion timestampsuccess: Whether refresh succeedederror_message: Error details if failedrows_affected: Number of rows modifiedstrategy: Refresh strategy used (Full, Incremental)trigger: What triggered the refresh (Auto, Manual, BaseTableChange)
Example Output:
mv_name | start_time | end_time | success | error_message | rows_affected | strategy | trigger--------------|---------------------|---------------------|---------|---------------|---------------|-------------|-------------sales_summary | 2025-11-24 10:30:00 | 2025-11-24 10:30:02 | true | | 150 | Incremental | Autouser_stats | 2025-11-24 10:29:00 | 2025-11-24 10:29:05 | true | | 50 | Full | Manualpg_mv_cpu_usage()
Returns current CPU usage and throttling status.
SELECT * FROM pg_mv_cpu_usage();Columns:
current_cpu_percent: Current CPU utilizationmax_cpu_percent: Configured CPU thresholdis_throttled: Whether refresh is being throttledactive_tasks: Number of active refresh tasksqueued_tasks: Number of queued refresh tasks
Example Output:
current_cpu_percent | max_cpu_percent | is_throttled | active_tasks | queued_tasks--------------------|-----------------|--------------|--------------|-------------45.2 | 50.0 | false | 2 | 3pg_mv_scheduler_stats()
Returns detailed scheduler statistics.
SELECT * FROM pg_mv_scheduler_stats();Columns:
queue_size: Number of tasks in priority queuerunning_tasks: Number of currently executing taskscpu_usage: Current CPU usage percentage
Integration Points
SQL Executor Integration
The auto-refresh system integrates with the SQL executor through:
- CREATE MATERIALIZED VIEW: Parses
auto_refreshoption and registers with worker - ALTER MATERIALIZED VIEW SET: Updates refresh policy at runtime
- REFRESH MATERIALIZED VIEW INCREMENTALLY: Triggers delta-based refresh
- Base Table Modifications: Triggers delta tracking for incremental refresh
ALTER MATERIALIZED VIEW SET
Configure MV options after creation using SQL:
-- Set staleness threshold (in seconds)ALTER MATERIALIZED VIEW sales_summary SET (staleness_threshold = 1800);
-- Set refresh strategyALTER MATERIALIZED VIEW sales_summary SET (refresh_strategy = 'incremental');
-- Set priority (0=low, 1=medium, 2=high, 3=critical)ALTER MATERIALIZED VIEW sales_summary SET (priority = 2);
-- Configure multiple options at onceALTER MATERIALIZED VIEW sales_summary SET ( staleness_threshold = 3600, max_cpu_percent = 25, incremental_enabled = true, priority = 2);Available Options:
| Option | Type | Description |
|---|---|---|
staleness_threshold | numeric | Seconds before MV considered stale |
max_cpu_percent | numeric | Maximum CPU usage for refresh |
priority | numeric | 0=low, 1=medium, 2=high, 3=critical |
refresh_strategy | string | ’manual’, ‘auto’, ‘incremental’ |
incremental_enabled | boolean | Enable/disable incremental refresh |
REFRESH MATERIALIZED VIEW INCREMENTALLY
Explicitly request incremental (delta-based) refresh:
-- Standard refresh (full)REFRESH MATERIALIZED VIEW sales_summary;
-- Incremental refresh (delta-based)REFRESH MATERIALIZED VIEW sales_summary INCREMENTALLY;
-- Concurrent refresh (non-blocking)REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;Example Code Integration
use heliosdb_nano::sql::phase3::MaterializedViewParser;
// Parse CREATE MATERIALIZED VIEW with auto_refreshlet options_str = "auto_refresh=true, max_cpu_percent=75";let options = MaterializedViewParser::parse_mv_options(options_str)?;
// Check if auto_refresh is enabledlet auto_refresh_enabled = options.iter().any(|opt| { matches!(opt, MaterializedViewOption::AutoRefresh(true))});
if auto_refresh_enabled { // Register with worker let policy = AutoRefreshPolicy::default(); worker.register_mv(&mv_name, policy)?;}Storage Engine Integration
Startup Sequence
use heliosdb_nano::storage::{ StorageEngine, MVScheduler, AutoRefreshWorker, SchedulerConfig, AutoRefreshConfig,};use std::sync::Arc;
async fn initialize_auto_refresh(storage: Arc<StorageEngine>) -> Result<()> { // 1. Create scheduler let scheduler_config = SchedulerConfig::default() .with_max_cpu_percent(70.0) .with_max_concurrent(4);
let scheduler = Arc::new(MVScheduler::new(scheduler_config, Arc::clone(&storage)));
// 2. Start scheduler background loop let scheduler_clone = Arc::clone(&scheduler); tokio::spawn(async move { if let Err(e) = scheduler_clone.run().await { eprintln!("Scheduler error: {}", e); } });
// 3. Create and start auto-refresh worker let worker_config = AutoRefreshConfig::default() .with_enabled(true);
let mut worker = AutoRefreshWorker::new( worker_config, Arc::clone(&storage), scheduler, );
worker.start().await?;
// 4. Register existing MVs with auto_refresh=true let catalog = MaterializedViewCatalog::new(&storage); let all_mvs = catalog.list_views()?;
for mv_name in all_mvs { let metadata = catalog.get_view(&mv_name)?;
if let Some(value) = metadata.metadata.get("auto_refresh") { if value.parse::<bool>().unwrap_or(false) { let policy = AutoRefreshPolicy::default(); worker.register_mv(&mv_name, policy)?; } } }
Ok(())}Shutdown Sequence
async fn shutdown_auto_refresh(worker: &mut AutoRefreshWorker) -> Result<()> { // Stop worker (waits for in-flight refreshes) worker.stop().await?;
// Note: MVScheduler will be stopped automatically when dropped
Ok(())}Delta Tracking Integration
The auto-refresh system uses DeltaTracker to monitor base table changes:
use heliosdb_nano::storage::MvDeltaTracker;
// DeltaTracker is automatically integrated with INSERT/UPDATE/DELETE operations// No manual integration required for basic usage
// For custom integration:let delta_tracker = Arc::new(MvDeltaTracker::new(Arc::clone(&storage.db))?);
// Record changes during transaction commitdelta_tracker.record_insert("sales", row_id, tuple)?;delta_tracker.record_update("sales", row_id, old_tuple, new_tuple)?;delta_tracker.record_delete("sales", row_id, tuple)?;Performance Considerations
CPU Throttling
The auto-refresh system respects CPU limits to avoid impacting foreground queries:
- Monitors CPU usage via
sysinfocrate - Uses exponential moving average for smoothing
- Skips refresh cycles if CPU exceeds threshold
- Adaptive batch sizing based on load
Priority-Based Scheduling
Refresh tasks are prioritized to ensure critical views are updated first:
- Critical: User-triggered refreshes
- High: Frequently queried views
- Normal: Regular MVs with auto-refresh
- Low: Rarely queried views
Incremental vs Full Refresh
The system automatically chooses the optimal strategy:
let cost = incremental_refresher.estimate_refresh_cost(&metadata)?;
// Uses incremental if cost < 50% of full refreshif cost.incremental_cost < cost.full_cost * 0.5 { // Use incremental refresh incremental_refresher.refresh_incremental(&mv_name)?;} else { // Fall back to full refresh full_refresh(&mv_name)?;}Testing
Unit Tests
# Run auto-refresh specific testscargo test --lib mv_auto_refresh
# Run system views testscargo test --lib mv_system_viewsIntegration Tests
# Run all MV integration testscargo test --test mv_auto_refresh_integration_test
# Run specific testcargo test --test mv_auto_refresh_integration_test test_staleness_based_refresh_triggerManual Testing
#[tokio::test]async fn manual_auto_refresh_test() { // 1. Create storage and components let storage = Arc::new(StorageEngine::open_in_memory(&Config::in_memory()).unwrap()); let scheduler = Arc::new(MVScheduler::new(SchedulerConfig::default(), Arc::clone(&storage)));
// 2. Create worker with short intervals let config = AutoRefreshConfig::default() .with_enabled(true) .with_interval_seconds(5) .with_staleness_threshold(10);
let mut worker = AutoRefreshWorker::new(config, Arc::clone(&storage), scheduler);
// 3. Create and register MV // ... create MV ... worker.register_mv("test_mv", AutoRefreshPolicy::default()).unwrap();
// 4. Start worker worker.start().await.unwrap();
// 5. Wait and observe tokio::time::sleep(Duration::from_secs(30)).await;
// 6. Check results let system_views = MvSystemViews::new(storage, scheduler); let statuses = system_views.pg_mv_auto_refresh_status().unwrap();
for status in statuses { println!("MV: {}, Staleness: {:?}s", status.mv_name, status.staleness_seconds); }
// 7. Stop worker worker.stop().await.unwrap();}Troubleshooting
Common Issues
1. Auto-Refresh Not Triggering
Symptoms: MVs remain stale despite auto_refresh=true
Checks:
-- Verify auto-refresh is enabledSELECT * FROM pg_mv_auto_refresh_status() WHERE mv_name = 'your_mv';
-- Check CPU throttlingSELECT * FROM pg_mv_cpu_usage();
-- Check scheduler queueSELECT * FROM pg_mv_scheduler_stats();Solutions:
- Verify worker is started:
worker.is_running() - Check CPU threshold: may need to increase
max_cpu_percent - Verify MV is registered:
worker.list_registered_mvs() - Check staleness threshold: may be too high
2. High CPU Usage
Symptoms: CPU constantly at 100%, queries slow
Checks:
SELECT * FROM pg_mv_cpu_usage();Solutions:
- Reduce
max_concurrent_refreshes - Increase
max_cpu_percentthreshold - Increase
staleness_thresholdto reduce refresh frequency - Use incremental refresh strategy
3. Refresh Failures
Symptoms: Refreshes failing in history
Checks:
SELECT * FROM pg_mv_refresh_history(100)WHERE success = falseORDER BY start_time DESC;Solutions:
- Check error messages in history
- Verify base tables still exist
- Check for schema changes
- Review query plan compatibility
Best Practices
1. Staleness Threshold Selection
// Short intervals for real-time dashboards (1-5 minutes).with_staleness_threshold(60)
// Medium intervals for analytics (5-30 minutes).with_staleness_threshold(300)
// Long intervals for reports (hours).with_staleness_threshold(3600)2. Priority Assignment
// Critical: User-facing, real-timePriority::Critical
// High: Frequently queriedPriority::High
// Normal: Standard auto-refresh (default)Priority::Normal
// Low: Batch jobs, reportsPriority::Low3. Concurrent Refresh
// Enable for large MVs to avoid downtimeAutoRefreshPolicy { concurrent: true, // Zero-downtime refresh ..Default::default()}4. Monitoring Strategy
-- Set up regular monitoring queries
-- Alert if any MV is stale beyond thresholdSELECT * FROM pg_mv_auto_refresh_status()WHERE staleness_seconds > threshold_seconds * 2 AND auto_refresh_enabled = true;
-- Check for recent failuresSELECT * FROM pg_mv_refresh_history(50)WHERE success = false AND start_time > NOW() - INTERVAL '1 hour';
-- Monitor CPU usageSELECT * FROM pg_mv_cpu_usage()WHERE is_throttled = true;API Reference
AutoRefreshConfig
pub struct AutoRefreshConfig { pub enabled: bool, pub interval_seconds: u64, pub staleness_threshold_seconds: i64, pub max_concurrent_refreshes: usize, pub max_cpu_percent: f64,}AutoRefreshPolicy
pub struct AutoRefreshPolicy { pub enabled: bool, pub refresh_interval_seconds: Option<i64>, pub priority: Priority, pub concurrent: bool,}AutoRefreshWorker Methods
impl AutoRefreshWorker { // Lifecycle pub async fn start(&mut self) -> Result<()>; pub async fn stop(&mut self) -> Result<()>; pub fn is_running(&self) -> bool;
// Configuration pub fn config(&self) -> AutoRefreshConfig; pub fn update_config(&self, config: AutoRefreshConfig);
// MV Management pub fn register_mv(&self, mv_name: &str, policy: AutoRefreshPolicy) -> Result<()>; pub fn unregister_mv(&self, mv_name: &str) -> Result<()>; pub fn update_policy(&self, mv_name: &str, policy: AutoRefreshPolicy) -> Result<()>; pub fn get_policy(&self, mv_name: &str) -> Option<AutoRefreshPolicy>; pub fn list_registered_mvs(&self) -> Vec<String>;
// Monitoring pub fn active_refresh_count(&self) -> usize; pub fn check_now(&self) -> Result<()>; pub fn get_refresh_history(&self, limit: Option<usize>) -> Vec<RefreshHistoryEntry>;}Future Enhancements
v2.4 (Planned)
- Predictive refresh scheduling based on query patterns
- Multi-node distributed auto-refresh coordination
- Custom refresh strategies via user-defined functions
- Webhook notifications for refresh events
v2.5 (Planned)
- Machine learning-based staleness prediction
- Automatic priority adjustment based on query frequency
- Cost-based refresh scheduling
- Refresh dependency graphs
Conclusion
The auto-refresh integration provides a robust, production-ready solution for automatic materialized view maintenance. By combining intelligent scheduling, CPU monitoring, and incremental refresh strategies, it ensures that views remain fresh without impacting system performance.
For additional information, see: