Materialized View Auto-Refresh Integration Guide

Version: 3.4.0 Last Updated: 2025-12-25

Overview

HeliosDB Nano Phase 3.4 integrates the auto-refresh worker with the materialized view system to provide end-to-end automatic refresh functionality. This document describes the complete architecture, usage, and monitoring capabilities.

New in v3.4.0

ALTER MATERIALIZED VIEW SET: Configure MV options after creation
REFRESH INCREMENTALLY: Explicit incremental refresh keyword
CPU-aware scheduling: Integrated with AutoRefreshWorker
Priority-based queue: Critical MVs refreshed first

Architecture

Components

The auto-refresh system consists of several integrated components:

AutoRefreshWorker: Background task that monitors MV staleness
MVScheduler: CPU-aware scheduler for prioritized refresh execution
IncrementalRefresher: Efficient delta-based refresh operations
DeltaTracker: Change tracking for incremental updates
MvSystemViews: PostgreSQL-compatible monitoring functions

Data Flow

Base Table Modifications
         ↓
   DeltaTracker
         ↓
AutoRefreshWorker (staleness check)
         ↓
   MVScheduler (priority queue)
         ↓
IncrementalRefresher (refresh execution)
         ↓
  Updated Materialized View

Usage

Creating Auto-Refresh Enabled MVs

-- Create MV with auto-refresh
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
    product_id,
    DATE_TRUNC('day', sale_date) as sale_day,
    SUM(amount) as total_sales,
    COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, sale_day
WITH (
    auto_refresh = true
);

Configuration

Global Configuration

use heliosdb_nano::storage::{AutoRefreshConfig, AutoRefreshWorker};

let config = AutoRefreshConfig::default()
    .with_enabled(true)
    .with_interval_seconds(60)              // Check every 60 seconds
    .with_staleness_threshold(300)          // Refresh if stale > 5 minutes
    .with_max_concurrent(2)                 // Max 2 concurrent refreshes
    .with_max_cpu_percent(50.0);            // Throttle if CPU > 50%

let worker = AutoRefreshWorker::new(config, storage, scheduler);
worker.start().await?;

Per-MV Configuration

use heliosdb_nano::storage::{AutoRefreshPolicy, Priority};

let policy = AutoRefreshPolicy {
    enabled: true,
    refresh_interval_seconds: Some(300),    // Override global threshold
    priority: Priority::High,                // High priority refresh
    concurrent: true,                        // Use CONCURRENT refresh
};

worker.register_mv("sales_summary", policy)?;

Runtime Configuration Updates

// Update global configuration
let new_config = AutoRefreshConfig::default()
    .with_enabled(true)
    .with_staleness_threshold(600);

worker.update_config(new_config);

// Update per-MV policy
let updated_policy = AutoRefreshPolicy {
    enabled: true,
    refresh_interval_seconds: Some(900),
    priority: Priority::Normal,
    concurrent: true,
};

worker.update_policy("sales_summary", updated_policy)?;

Monitoring

System Views

pg_mv_auto_refresh_status()

Returns current status of all materialized views with auto-refresh configuration.

SELECT * FROM pg_mv_auto_refresh_status();

Columns:

mv_name: Materialized view name
auto_refresh_enabled: Whether auto-refresh is enabled
last_refresh: Timestamp of last refresh (ISO 8601)
staleness_seconds: Seconds since last refresh
threshold_seconds: Configured staleness threshold
is_refreshing: Whether currently being refreshed
refresh_strategy: Strategy (manual, auto, incremental)
row_count: Number of rows in the MV
base_table_count: Number of dependent base tables

Example Output:

mv_name         | auto_refresh_enabled | last_refresh        | staleness_seconds | threshold_seconds | is_refreshing | refresh_strategy | row_count | base_table_count
----------------|----------------------|---------------------|-------------------|-------------------|---------------|------------------|-----------|------------------
sales_summary   | true                 | 2025-11-24 10:30:00 | 120               | 300               | false         | incremental      | 15000     | 1
user_stats      | true                 | 2025-11-24 10:25:00 | 420               | 300               | true          | auto             | 5000      | 2

pg_mv_refresh_history(limit?)

Returns historical refresh operations.

-- Get last 50 refreshes
SELECT * FROM pg_mv_refresh_history(50);

Columns:

mv_name: Materialized view name
start_time: Refresh start timestamp
end_time: Refresh completion timestamp
success: Whether refresh succeeded
error_message: Error details if failed
rows_affected: Number of rows modified
strategy: Refresh strategy used (Full, Incremental)
trigger: What triggered the refresh (Auto, Manual, BaseTableChange)

Example Output:

mv_name       | start_time          | end_time            | success | error_message | rows_affected | strategy    | trigger
--------------|---------------------|---------------------|---------|---------------|---------------|-------------|-------------
sales_summary | 2025-11-24 10:30:00 | 2025-11-24 10:30:02 | true    |               | 150           | Incremental | Auto
user_stats    | 2025-11-24 10:29:00 | 2025-11-24 10:29:05 | true    |               | 50            | Full        | Manual

pg_mv_cpu_usage()

Returns current CPU usage and throttling status.

SELECT * FROM pg_mv_cpu_usage();

Columns:

current_cpu_percent: Current CPU utilization
max_cpu_percent: Configured CPU threshold
is_throttled: Whether refresh is being throttled
active_tasks: Number of active refresh tasks
queued_tasks: Number of queued refresh tasks

Example Output:

current_cpu_percent | max_cpu_percent | is_throttled | active_tasks | queued_tasks
--------------------|-----------------|--------------|--------------|-------------
45.2                | 50.0            | false        | 2            | 3

pg_mv_scheduler_stats()

Returns detailed scheduler statistics.

SELECT * FROM pg_mv_scheduler_stats();

Columns:

queue_size: Number of tasks in priority queue
running_tasks: Number of currently executing tasks
cpu_usage: Current CPU usage percentage

Integration Points

SQL Executor Integration

The auto-refresh system integrates with the SQL executor through:

CREATE MATERIALIZED VIEW: Parses auto_refresh option and registers with worker
ALTER MATERIALIZED VIEW SET: Updates refresh policy at runtime
REFRESH MATERIALIZED VIEW INCREMENTALLY: Triggers delta-based refresh
Base Table Modifications: Triggers delta tracking for incremental refresh

ALTER MATERIALIZED VIEW SET

Configure MV options after creation using SQL:

-- Set staleness threshold (in seconds)
ALTER MATERIALIZED VIEW sales_summary SET (staleness_threshold = 1800);

-- Set refresh strategy
ALTER MATERIALIZED VIEW sales_summary SET (refresh_strategy = 'incremental');

-- Set priority (0=low, 1=medium, 2=high, 3=critical)
ALTER MATERIALIZED VIEW sales_summary SET (priority = 2);

-- Configure multiple options at once
ALTER MATERIALIZED VIEW sales_summary SET (
    staleness_threshold = 3600,
    max_cpu_percent = 25,
    incremental_enabled = true,
    priority = 2
);

Available Options:

Option	Type	Description
`staleness_threshold`	numeric	Seconds before MV considered stale
`max_cpu_percent`	numeric	Maximum CPU usage for refresh
`priority`	numeric	0=low, 1=medium, 2=high, 3=critical
`refresh_strategy`	string	’manual’, ‘auto’, ‘incremental’
`incremental_enabled`	boolean	Enable/disable incremental refresh

REFRESH MATERIALIZED VIEW INCREMENTALLY

Explicitly request incremental (delta-based) refresh:

-- Standard refresh (full)
REFRESH MATERIALIZED VIEW sales_summary;

-- Incremental refresh (delta-based)
REFRESH MATERIALIZED VIEW sales_summary INCREMENTALLY;

-- Concurrent refresh (non-blocking)
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;

Example Code Integration

use heliosdb_nano::sql::phase3::MaterializedViewParser;

// Parse CREATE MATERIALIZED VIEW with auto_refresh
let options_str = "auto_refresh=true, max_cpu_percent=75";
let options = MaterializedViewParser::parse_mv_options(options_str)?;

// Check if auto_refresh is enabled
let auto_refresh_enabled = options.iter().any(|opt| {
    matches!(opt, MaterializedViewOption::AutoRefresh(true))
});

if auto_refresh_enabled {
    // Register with worker
    let policy = AutoRefreshPolicy::default();
    worker.register_mv(&mv_name, policy)?;
}

Storage Engine Integration

Startup Sequence

use heliosdb_nano::storage::{
    StorageEngine, MVScheduler, AutoRefreshWorker,
    SchedulerConfig, AutoRefreshConfig,
};
use std::sync::Arc;

async fn initialize_auto_refresh(storage: Arc<StorageEngine>) -> Result<()> {
    // 1. Create scheduler
    let scheduler_config = SchedulerConfig::default()
        .with_max_cpu_percent(70.0)
        .with_max_concurrent(4);

    let scheduler = Arc::new(MVScheduler::new(scheduler_config, Arc::clone(&storage)));

    // 2. Start scheduler background loop
    let scheduler_clone = Arc::clone(&scheduler);
    tokio::spawn(async move {
        if let Err(e) = scheduler_clone.run().await {
            eprintln!("Scheduler error: {}", e);
        }
    });

    // 3. Create and start auto-refresh worker
    let worker_config = AutoRefreshConfig::default()
        .with_enabled(true);

    let mut worker = AutoRefreshWorker::new(
        worker_config,
        Arc::clone(&storage),
        scheduler,
    );

    worker.start().await?;

    // 4. Register existing MVs with auto_refresh=true
    let catalog = MaterializedViewCatalog::new(&storage);
    let all_mvs = catalog.list_views()?;

    for mv_name in all_mvs {
        let metadata = catalog.get_view(&mv_name)?;

        if let Some(value) = metadata.metadata.get("auto_refresh") {
            if value.parse::<bool>().unwrap_or(false) {
                let policy = AutoRefreshPolicy::default();
                worker.register_mv(&mv_name, policy)?;
            }
        }
    }

    Ok(())
}

Shutdown Sequence

async fn shutdown_auto_refresh(worker: &mut AutoRefreshWorker) -> Result<()> {
    // Stop worker (waits for in-flight refreshes)
    worker.stop().await?;

    // Note: MVScheduler will be stopped automatically when dropped

    Ok(())
}

Delta Tracking Integration

The auto-refresh system uses DeltaTracker to monitor base table changes:

use heliosdb_nano::storage::MvDeltaTracker;

// DeltaTracker is automatically integrated with INSERT/UPDATE/DELETE operations
// No manual integration required for basic usage

// For custom integration:
let delta_tracker = Arc::new(MvDeltaTracker::new(Arc::clone(&storage.db))?);

// Record changes during transaction commit
delta_tracker.record_insert("sales", row_id, tuple)?;
delta_tracker.record_update("sales", row_id, old_tuple, new_tuple)?;
delta_tracker.record_delete("sales", row_id, tuple)?;

Performance Considerations

CPU Throttling

The auto-refresh system respects CPU limits to avoid impacting foreground queries:

Monitors CPU usage via sysinfo crate
Uses exponential moving average for smoothing
Skips refresh cycles if CPU exceeds threshold
Adaptive batch sizing based on load

Priority-Based Scheduling

Refresh tasks are prioritized to ensure critical views are updated first:

Critical: User-triggered refreshes
High: Frequently queried views
Normal: Regular MVs with auto-refresh
Low: Rarely queried views

Incremental vs Full Refresh

The system automatically chooses the optimal strategy:

let cost = incremental_refresher.estimate_refresh_cost(&metadata)?;

// Uses incremental if cost < 50% of full refresh
if cost.incremental_cost < cost.full_cost * 0.5 {
    // Use incremental refresh
    incremental_refresher.refresh_incremental(&mv_name)?;
} else {
    // Fall back to full refresh
    full_refresh(&mv_name)?;
}

Testing

Unit Tests

# Run auto-refresh specific tests
cargo test --lib mv_auto_refresh

# Run system views tests
cargo test --lib mv_system_views

Integration Tests

# Run all MV integration tests
cargo test --test mv_auto_refresh_integration_test

# Run specific test
cargo test --test mv_auto_refresh_integration_test test_staleness_based_refresh_trigger

Manual Testing

#[tokio::test]
async fn manual_auto_refresh_test() {
    // 1. Create storage and components
    let storage = Arc::new(StorageEngine::open_in_memory(&Config::in_memory()).unwrap());
    let scheduler = Arc::new(MVScheduler::new(SchedulerConfig::default(), Arc::clone(&storage)));

    // 2. Create worker with short intervals
    let config = AutoRefreshConfig::default()
        .with_enabled(true)
        .with_interval_seconds(5)
        .with_staleness_threshold(10);

    let mut worker = AutoRefreshWorker::new(config, Arc::clone(&storage), scheduler);

    // 3. Create and register MV
    // ... create MV ...
    worker.register_mv("test_mv", AutoRefreshPolicy::default()).unwrap();

    // 4. Start worker
    worker.start().await.unwrap();

    // 5. Wait and observe
    tokio::time::sleep(Duration::from_secs(30)).await;

    // 6. Check results
    let system_views = MvSystemViews::new(storage, scheduler);
    let statuses = system_views.pg_mv_auto_refresh_status().unwrap();

    for status in statuses {
        println!("MV: {}, Staleness: {:?}s", status.mv_name, status.staleness_seconds);
    }

    // 7. Stop worker
    worker.stop().await.unwrap();
}

Troubleshooting

Common Issues

1. Auto-Refresh Not Triggering

Symptoms: MVs remain stale despite auto_refresh=true

Checks:

-- Verify auto-refresh is enabled
SELECT * FROM pg_mv_auto_refresh_status() WHERE mv_name = 'your_mv';

-- Check CPU throttling
SELECT * FROM pg_mv_cpu_usage();

-- Check scheduler queue
SELECT * FROM pg_mv_scheduler_stats();

Solutions:

Verify worker is started: worker.is_running()
Check CPU threshold: may need to increase max_cpu_percent
Verify MV is registered: worker.list_registered_mvs()
Check staleness threshold: may be too high

2. High CPU Usage

Symptoms: CPU constantly at 100%, queries slow

Checks:

SELECT * FROM pg_mv_cpu_usage();

Solutions:

Reduce max_concurrent_refreshes
Increase max_cpu_percent threshold
Increase staleness_threshold to reduce refresh frequency
Use incremental refresh strategy

3. Refresh Failures

Symptoms: Refreshes failing in history

Checks:

SELECT * FROM pg_mv_refresh_history(100)
WHERE success = false
ORDER BY start_time DESC;

Solutions:

Check error messages in history
Verify base tables still exist
Check for schema changes
Review query plan compatibility

Best Practices

1. Staleness Threshold Selection

// Short intervals for real-time dashboards (1-5 minutes)
.with_staleness_threshold(60)

// Medium intervals for analytics (5-30 minutes)
.with_staleness_threshold(300)

// Long intervals for reports (hours)
.with_staleness_threshold(3600)

2. Priority Assignment

// Critical: User-facing, real-time
Priority::Critical

// High: Frequently queried
Priority::High

// Normal: Standard auto-refresh (default)
Priority::Normal

// Low: Batch jobs, reports
Priority::Low

3. Concurrent Refresh

// Enable for large MVs to avoid downtime
AutoRefreshPolicy {
    concurrent: true,  // Zero-downtime refresh
    ..Default::default()
}

4. Monitoring Strategy

-- Set up regular monitoring queries

-- Alert if any MV is stale beyond threshold
SELECT * FROM pg_mv_auto_refresh_status()
WHERE staleness_seconds > threshold_seconds * 2
  AND auto_refresh_enabled = true;

-- Check for recent failures
SELECT * FROM pg_mv_refresh_history(50)
WHERE success = false
  AND start_time > NOW() - INTERVAL '1 hour';

-- Monitor CPU usage
SELECT * FROM pg_mv_cpu_usage()
WHERE is_throttled = true;

API Reference

AutoRefreshConfig

pub struct AutoRefreshConfig {
    pub enabled: bool,
    pub interval_seconds: u64,
    pub staleness_threshold_seconds: i64,
    pub max_concurrent_refreshes: usize,
    pub max_cpu_percent: f64,
}

AutoRefreshPolicy

pub struct AutoRefreshPolicy {
    pub enabled: bool,
    pub refresh_interval_seconds: Option<i64>,
    pub priority: Priority,
    pub concurrent: bool,
}

AutoRefreshWorker Methods

impl AutoRefreshWorker {
    // Lifecycle
    pub async fn start(&mut self) -> Result<()>;
    pub async fn stop(&mut self) -> Result<()>;
    pub fn is_running(&self) -> bool;

    // Configuration
    pub fn config(&self) -> AutoRefreshConfig;
    pub fn update_config(&self, config: AutoRefreshConfig);

    // MV Management
    pub fn register_mv(&self, mv_name: &str, policy: AutoRefreshPolicy) -> Result<()>;
    pub fn unregister_mv(&self, mv_name: &str) -> Result<()>;
    pub fn update_policy(&self, mv_name: &str, policy: AutoRefreshPolicy) -> Result<()>;
    pub fn get_policy(&self, mv_name: &str) -> Option<AutoRefreshPolicy>;
    pub fn list_registered_mvs(&self) -> Vec<String>;

    // Monitoring
    pub fn active_refresh_count(&self) -> usize;
    pub fn check_now(&self) -> Result<()>;
    pub fn get_refresh_history(&self, limit: Option<usize>) -> Vec<RefreshHistoryEntry>;
}

Future Enhancements

v2.4 (Planned)

Predictive refresh scheduling based on query patterns
Multi-node distributed auto-refresh coordination
Custom refresh strategies via user-defined functions
Webhook notifications for refresh events

v2.5 (Planned)

Machine learning-based staleness prediction
Automatic priority adjustment based on query frequency
Cost-based refresh scheduling
Refresh dependency graphs

Conclusion

The auto-refresh integration provides a robust, production-ready solution for automatic materialized view maintenance. By combining intelligent scheduling, CPU monitoring, and incremental refresh strategies, it ensures that views remain fresh without impacting system performance.

For additional information, see: