Skip to content

Materialized View Auto-Refresh Integration Guide

Materialized View Auto-Refresh Integration Guide

Version: 3.4.0 Last Updated: 2025-12-25

Overview

HeliosDB Nano Phase 3.4 integrates the auto-refresh worker with the materialized view system to provide end-to-end automatic refresh functionality. This document describes the complete architecture, usage, and monitoring capabilities.

New in v3.4.0

  • ALTER MATERIALIZED VIEW SET: Configure MV options after creation
  • REFRESH INCREMENTALLY: Explicit incremental refresh keyword
  • CPU-aware scheduling: Integrated with AutoRefreshWorker
  • Priority-based queue: Critical MVs refreshed first

Architecture

Components

The auto-refresh system consists of several integrated components:

  1. AutoRefreshWorker: Background task that monitors MV staleness
  2. MVScheduler: CPU-aware scheduler for prioritized refresh execution
  3. IncrementalRefresher: Efficient delta-based refresh operations
  4. DeltaTracker: Change tracking for incremental updates
  5. MvSystemViews: PostgreSQL-compatible monitoring functions

Data Flow

Base Table Modifications
DeltaTracker
AutoRefreshWorker (staleness check)
MVScheduler (priority queue)
IncrementalRefresher (refresh execution)
Updated Materialized View

Usage

Creating Auto-Refresh Enabled MVs

-- Create MV with auto-refresh
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
product_id,
DATE_TRUNC('day', sale_date) as sale_day,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, sale_day
WITH (
auto_refresh = true
);

Configuration

Global Configuration

use heliosdb_nano::storage::{AutoRefreshConfig, AutoRefreshWorker};
let config = AutoRefreshConfig::default()
.with_enabled(true)
.with_interval_seconds(60) // Check every 60 seconds
.with_staleness_threshold(300) // Refresh if stale > 5 minutes
.with_max_concurrent(2) // Max 2 concurrent refreshes
.with_max_cpu_percent(50.0); // Throttle if CPU > 50%
let worker = AutoRefreshWorker::new(config, storage, scheduler);
worker.start().await?;

Per-MV Configuration

use heliosdb_nano::storage::{AutoRefreshPolicy, Priority};
let policy = AutoRefreshPolicy {
enabled: true,
refresh_interval_seconds: Some(300), // Override global threshold
priority: Priority::High, // High priority refresh
concurrent: true, // Use CONCURRENT refresh
};
worker.register_mv("sales_summary", policy)?;

Runtime Configuration Updates

// Update global configuration
let new_config = AutoRefreshConfig::default()
.with_enabled(true)
.with_staleness_threshold(600);
worker.update_config(new_config);
// Update per-MV policy
let updated_policy = AutoRefreshPolicy {
enabled: true,
refresh_interval_seconds: Some(900),
priority: Priority::Normal,
concurrent: true,
};
worker.update_policy("sales_summary", updated_policy)?;

Monitoring

System Views

pg_mv_auto_refresh_status()

Returns current status of all materialized views with auto-refresh configuration.

SELECT * FROM pg_mv_auto_refresh_status();

Columns:

  • mv_name: Materialized view name
  • auto_refresh_enabled: Whether auto-refresh is enabled
  • last_refresh: Timestamp of last refresh (ISO 8601)
  • staleness_seconds: Seconds since last refresh
  • threshold_seconds: Configured staleness threshold
  • is_refreshing: Whether currently being refreshed
  • refresh_strategy: Strategy (manual, auto, incremental)
  • row_count: Number of rows in the MV
  • base_table_count: Number of dependent base tables

Example Output:

mv_name | auto_refresh_enabled | last_refresh | staleness_seconds | threshold_seconds | is_refreshing | refresh_strategy | row_count | base_table_count
----------------|----------------------|---------------------|-------------------|-------------------|---------------|------------------|-----------|------------------
sales_summary | true | 2025-11-24 10:30:00 | 120 | 300 | false | incremental | 15000 | 1
user_stats | true | 2025-11-24 10:25:00 | 420 | 300 | true | auto | 5000 | 2

pg_mv_refresh_history(limit?)

Returns historical refresh operations.

-- Get last 50 refreshes
SELECT * FROM pg_mv_refresh_history(50);

Columns:

  • mv_name: Materialized view name
  • start_time: Refresh start timestamp
  • end_time: Refresh completion timestamp
  • success: Whether refresh succeeded
  • error_message: Error details if failed
  • rows_affected: Number of rows modified
  • strategy: Refresh strategy used (Full, Incremental)
  • trigger: What triggered the refresh (Auto, Manual, BaseTableChange)

Example Output:

mv_name | start_time | end_time | success | error_message | rows_affected | strategy | trigger
--------------|---------------------|---------------------|---------|---------------|---------------|-------------|-------------
sales_summary | 2025-11-24 10:30:00 | 2025-11-24 10:30:02 | true | | 150 | Incremental | Auto
user_stats | 2025-11-24 10:29:00 | 2025-11-24 10:29:05 | true | | 50 | Full | Manual

pg_mv_cpu_usage()

Returns current CPU usage and throttling status.

SELECT * FROM pg_mv_cpu_usage();

Columns:

  • current_cpu_percent: Current CPU utilization
  • max_cpu_percent: Configured CPU threshold
  • is_throttled: Whether refresh is being throttled
  • active_tasks: Number of active refresh tasks
  • queued_tasks: Number of queued refresh tasks

Example Output:

current_cpu_percent | max_cpu_percent | is_throttled | active_tasks | queued_tasks
--------------------|-----------------|--------------|--------------|-------------
45.2 | 50.0 | false | 2 | 3

pg_mv_scheduler_stats()

Returns detailed scheduler statistics.

SELECT * FROM pg_mv_scheduler_stats();

Columns:

  • queue_size: Number of tasks in priority queue
  • running_tasks: Number of currently executing tasks
  • cpu_usage: Current CPU usage percentage

Integration Points

SQL Executor Integration

The auto-refresh system integrates with the SQL executor through:

  1. CREATE MATERIALIZED VIEW: Parses auto_refresh option and registers with worker
  2. ALTER MATERIALIZED VIEW SET: Updates refresh policy at runtime
  3. REFRESH MATERIALIZED VIEW INCREMENTALLY: Triggers delta-based refresh
  4. Base Table Modifications: Triggers delta tracking for incremental refresh

ALTER MATERIALIZED VIEW SET

Configure MV options after creation using SQL:

-- Set staleness threshold (in seconds)
ALTER MATERIALIZED VIEW sales_summary SET (staleness_threshold = 1800);
-- Set refresh strategy
ALTER MATERIALIZED VIEW sales_summary SET (refresh_strategy = 'incremental');
-- Set priority (0=low, 1=medium, 2=high, 3=critical)
ALTER MATERIALIZED VIEW sales_summary SET (priority = 2);
-- Configure multiple options at once
ALTER MATERIALIZED VIEW sales_summary SET (
staleness_threshold = 3600,
max_cpu_percent = 25,
incremental_enabled = true,
priority = 2
);

Available Options:

OptionTypeDescription
staleness_thresholdnumericSeconds before MV considered stale
max_cpu_percentnumericMaximum CPU usage for refresh
prioritynumeric0=low, 1=medium, 2=high, 3=critical
refresh_strategystring’manual’, ‘auto’, ‘incremental’
incremental_enabledbooleanEnable/disable incremental refresh

REFRESH MATERIALIZED VIEW INCREMENTALLY

Explicitly request incremental (delta-based) refresh:

-- Standard refresh (full)
REFRESH MATERIALIZED VIEW sales_summary;
-- Incremental refresh (delta-based)
REFRESH MATERIALIZED VIEW sales_summary INCREMENTALLY;
-- Concurrent refresh (non-blocking)
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;

Example Code Integration

use heliosdb_nano::sql::phase3::MaterializedViewParser;
// Parse CREATE MATERIALIZED VIEW with auto_refresh
let options_str = "auto_refresh=true, max_cpu_percent=75";
let options = MaterializedViewParser::parse_mv_options(options_str)?;
// Check if auto_refresh is enabled
let auto_refresh_enabled = options.iter().any(|opt| {
matches!(opt, MaterializedViewOption::AutoRefresh(true))
});
if auto_refresh_enabled {
// Register with worker
let policy = AutoRefreshPolicy::default();
worker.register_mv(&mv_name, policy)?;
}

Storage Engine Integration

Startup Sequence

use heliosdb_nano::storage::{
StorageEngine, MVScheduler, AutoRefreshWorker,
SchedulerConfig, AutoRefreshConfig,
};
use std::sync::Arc;
async fn initialize_auto_refresh(storage: Arc<StorageEngine>) -> Result<()> {
// 1. Create scheduler
let scheduler_config = SchedulerConfig::default()
.with_max_cpu_percent(70.0)
.with_max_concurrent(4);
let scheduler = Arc::new(MVScheduler::new(scheduler_config, Arc::clone(&storage)));
// 2. Start scheduler background loop
let scheduler_clone = Arc::clone(&scheduler);
tokio::spawn(async move {
if let Err(e) = scheduler_clone.run().await {
eprintln!("Scheduler error: {}", e);
}
});
// 3. Create and start auto-refresh worker
let worker_config = AutoRefreshConfig::default()
.with_enabled(true);
let mut worker = AutoRefreshWorker::new(
worker_config,
Arc::clone(&storage),
scheduler,
);
worker.start().await?;
// 4. Register existing MVs with auto_refresh=true
let catalog = MaterializedViewCatalog::new(&storage);
let all_mvs = catalog.list_views()?;
for mv_name in all_mvs {
let metadata = catalog.get_view(&mv_name)?;
if let Some(value) = metadata.metadata.get("auto_refresh") {
if value.parse::<bool>().unwrap_or(false) {
let policy = AutoRefreshPolicy::default();
worker.register_mv(&mv_name, policy)?;
}
}
}
Ok(())
}

Shutdown Sequence

async fn shutdown_auto_refresh(worker: &mut AutoRefreshWorker) -> Result<()> {
// Stop worker (waits for in-flight refreshes)
worker.stop().await?;
// Note: MVScheduler will be stopped automatically when dropped
Ok(())
}

Delta Tracking Integration

The auto-refresh system uses DeltaTracker to monitor base table changes:

use heliosdb_nano::storage::MvDeltaTracker;
// DeltaTracker is automatically integrated with INSERT/UPDATE/DELETE operations
// No manual integration required for basic usage
// For custom integration:
let delta_tracker = Arc::new(MvDeltaTracker::new(Arc::clone(&storage.db))?);
// Record changes during transaction commit
delta_tracker.record_insert("sales", row_id, tuple)?;
delta_tracker.record_update("sales", row_id, old_tuple, new_tuple)?;
delta_tracker.record_delete("sales", row_id, tuple)?;

Performance Considerations

CPU Throttling

The auto-refresh system respects CPU limits to avoid impacting foreground queries:

  • Monitors CPU usage via sysinfo crate
  • Uses exponential moving average for smoothing
  • Skips refresh cycles if CPU exceeds threshold
  • Adaptive batch sizing based on load

Priority-Based Scheduling

Refresh tasks are prioritized to ensure critical views are updated first:

  • Critical: User-triggered refreshes
  • High: Frequently queried views
  • Normal: Regular MVs with auto-refresh
  • Low: Rarely queried views

Incremental vs Full Refresh

The system automatically chooses the optimal strategy:

let cost = incremental_refresher.estimate_refresh_cost(&metadata)?;
// Uses incremental if cost < 50% of full refresh
if cost.incremental_cost < cost.full_cost * 0.5 {
// Use incremental refresh
incremental_refresher.refresh_incremental(&mv_name)?;
} else {
// Fall back to full refresh
full_refresh(&mv_name)?;
}

Testing

Unit Tests

Terminal window
# Run auto-refresh specific tests
cargo test --lib mv_auto_refresh
# Run system views tests
cargo test --lib mv_system_views

Integration Tests

Terminal window
# Run all MV integration tests
cargo test --test mv_auto_refresh_integration_test
# Run specific test
cargo test --test mv_auto_refresh_integration_test test_staleness_based_refresh_trigger

Manual Testing

#[tokio::test]
async fn manual_auto_refresh_test() {
// 1. Create storage and components
let storage = Arc::new(StorageEngine::open_in_memory(&Config::in_memory()).unwrap());
let scheduler = Arc::new(MVScheduler::new(SchedulerConfig::default(), Arc::clone(&storage)));
// 2. Create worker with short intervals
let config = AutoRefreshConfig::default()
.with_enabled(true)
.with_interval_seconds(5)
.with_staleness_threshold(10);
let mut worker = AutoRefreshWorker::new(config, Arc::clone(&storage), scheduler);
// 3. Create and register MV
// ... create MV ...
worker.register_mv("test_mv", AutoRefreshPolicy::default()).unwrap();
// 4. Start worker
worker.start().await.unwrap();
// 5. Wait and observe
tokio::time::sleep(Duration::from_secs(30)).await;
// 6. Check results
let system_views = MvSystemViews::new(storage, scheduler);
let statuses = system_views.pg_mv_auto_refresh_status().unwrap();
for status in statuses {
println!("MV: {}, Staleness: {:?}s", status.mv_name, status.staleness_seconds);
}
// 7. Stop worker
worker.stop().await.unwrap();
}

Troubleshooting

Common Issues

1. Auto-Refresh Not Triggering

Symptoms: MVs remain stale despite auto_refresh=true

Checks:

-- Verify auto-refresh is enabled
SELECT * FROM pg_mv_auto_refresh_status() WHERE mv_name = 'your_mv';
-- Check CPU throttling
SELECT * FROM pg_mv_cpu_usage();
-- Check scheduler queue
SELECT * FROM pg_mv_scheduler_stats();

Solutions:

  • Verify worker is started: worker.is_running()
  • Check CPU threshold: may need to increase max_cpu_percent
  • Verify MV is registered: worker.list_registered_mvs()
  • Check staleness threshold: may be too high

2. High CPU Usage

Symptoms: CPU constantly at 100%, queries slow

Checks:

SELECT * FROM pg_mv_cpu_usage();

Solutions:

  • Reduce max_concurrent_refreshes
  • Increase max_cpu_percent threshold
  • Increase staleness_threshold to reduce refresh frequency
  • Use incremental refresh strategy

3. Refresh Failures

Symptoms: Refreshes failing in history

Checks:

SELECT * FROM pg_mv_refresh_history(100)
WHERE success = false
ORDER BY start_time DESC;

Solutions:

  • Check error messages in history
  • Verify base tables still exist
  • Check for schema changes
  • Review query plan compatibility

Best Practices

1. Staleness Threshold Selection

// Short intervals for real-time dashboards (1-5 minutes)
.with_staleness_threshold(60)
// Medium intervals for analytics (5-30 minutes)
.with_staleness_threshold(300)
// Long intervals for reports (hours)
.with_staleness_threshold(3600)

2. Priority Assignment

// Critical: User-facing, real-time
Priority::Critical
// High: Frequently queried
Priority::High
// Normal: Standard auto-refresh (default)
Priority::Normal
// Low: Batch jobs, reports
Priority::Low

3. Concurrent Refresh

// Enable for large MVs to avoid downtime
AutoRefreshPolicy {
concurrent: true, // Zero-downtime refresh
..Default::default()
}

4. Monitoring Strategy

-- Set up regular monitoring queries
-- Alert if any MV is stale beyond threshold
SELECT * FROM pg_mv_auto_refresh_status()
WHERE staleness_seconds > threshold_seconds * 2
AND auto_refresh_enabled = true;
-- Check for recent failures
SELECT * FROM pg_mv_refresh_history(50)
WHERE success = false
AND start_time > NOW() - INTERVAL '1 hour';
-- Monitor CPU usage
SELECT * FROM pg_mv_cpu_usage()
WHERE is_throttled = true;

API Reference

AutoRefreshConfig

pub struct AutoRefreshConfig {
pub enabled: bool,
pub interval_seconds: u64,
pub staleness_threshold_seconds: i64,
pub max_concurrent_refreshes: usize,
pub max_cpu_percent: f64,
}

AutoRefreshPolicy

pub struct AutoRefreshPolicy {
pub enabled: bool,
pub refresh_interval_seconds: Option<i64>,
pub priority: Priority,
pub concurrent: bool,
}

AutoRefreshWorker Methods

impl AutoRefreshWorker {
// Lifecycle
pub async fn start(&mut self) -> Result<()>;
pub async fn stop(&mut self) -> Result<()>;
pub fn is_running(&self) -> bool;
// Configuration
pub fn config(&self) -> AutoRefreshConfig;
pub fn update_config(&self, config: AutoRefreshConfig);
// MV Management
pub fn register_mv(&self, mv_name: &str, policy: AutoRefreshPolicy) -> Result<()>;
pub fn unregister_mv(&self, mv_name: &str) -> Result<()>;
pub fn update_policy(&self, mv_name: &str, policy: AutoRefreshPolicy) -> Result<()>;
pub fn get_policy(&self, mv_name: &str) -> Option<AutoRefreshPolicy>;
pub fn list_registered_mvs(&self) -> Vec<String>;
// Monitoring
pub fn active_refresh_count(&self) -> usize;
pub fn check_now(&self) -> Result<()>;
pub fn get_refresh_history(&self, limit: Option<usize>) -> Vec<RefreshHistoryEntry>;
}

Future Enhancements

v2.4 (Planned)

  • Predictive refresh scheduling based on query patterns
  • Multi-node distributed auto-refresh coordination
  • Custom refresh strategies via user-defined functions
  • Webhook notifications for refresh events

v2.5 (Planned)

  • Machine learning-based staleness prediction
  • Automatic priority adjustment based on query frequency
  • Cost-based refresh scheduling
  • Refresh dependency graphs

Conclusion

The auto-refresh integration provides a robust, production-ready solution for automatic materialized view maintenance. By combining intelligent scheduling, CPU monitoring, and incremental refresh strategies, it ensures that views remain fresh without impacting system performance.

For additional information, see: