Incremental Materialized View Refresh
Incremental Materialized View Refresh
Version: 3.4.0 Status: Implemented Author: HeliosDB Team Date: 2025-12-25
Overview
Incremental computation for materialized views enables efficient delta-based updates instead of full recomputation. This feature dramatically improves refresh performance by processing only the changes (deltas) to base tables rather than rescanning the entire dataset.
New in v3.4.0
REFRESH MATERIALIZED VIEW INCREMENTALLY
Explicit SQL syntax to request incremental refresh:
-- Request incremental refresh explicitlyREFRESH MATERIALIZED VIEW sales_summary INCREMENTALLY;
-- Falls back to full refresh if incremental not possibleALTER MATERIALIZED VIEW SET
Configure MV options after creation:
-- Enable incremental refresh strategyALTER MATERIALIZED VIEW sales_summary SET (refresh_strategy = 'incremental');
-- Configure staleness threshold (seconds) and CPU limitsALTER MATERIALIZED VIEW sales_summary SET ( staleness_threshold = 1800, max_cpu_percent = 25, incremental_enabled = true);
-- Set refresh priority (0=low, 1=medium, 2=high, 3=critical)ALTER MATERIALIZED VIEW sales_summary SET (priority = 2);AutoRefreshWorker Integration
The AutoRefreshWorker now integrates with the incremental refresh system:
- CPU-aware scheduling prevents system overload
- Priority-based refresh queue
- Automatic strategy selection (incremental vs full)
- Delta tracking cleanup after successful refresh
Key Features
1. Delta Tracking
The system tracks all changes (inserts, updates, deletes) to base tables:
use heliosdb_nano::storage::{DeltaTracker, DeltaOperation};
// Track changes to base tablestracker.record_insert("users", tuple, timestamp);tracker.record_update("users", old_tuple, new_tuple, timestamp);tracker.record_delete("users", tuple, timestamp);
// Retrieve changes since last refreshlet deltas = tracker.get_deltas_since("users", last_refresh_ts);2. Incremental Refresh Strategies
Three refresh strategies are supported:
- Full: Complete recomputation from base tables
- Incremental: Delta-based updates only
- Hybrid: Automatic selection based on cost estimation
3. Cost-Based Optimization
The system automatically estimates costs and recommends the optimal strategy:
use heliosdb_nano::storage::IncrementalRefresher;
let refresher = IncrementalRefresher::new(storage, delta_tracker);let cost = refresher.estimate_refresh_cost(&mv_metadata)?;
println!("Incremental cost: {:.2}s", cost.incremental_cost);println!("Full refresh cost: {:.2}s", cost.full_cost);println!("Recommendation: {:?}", cost.recommendation);4. Supported Operations
Aggregate Views
Efficiently update aggregates without full recomputation:
- COUNT: Increment/decrement counters
- SUM: Add/subtract values
- AVG: Update running averages
- MIN/MAX: Track extrema (with validation)
CREATE MATERIALIZED VIEW sales_summary ASSELECT product_id, COUNT(*) as order_count, SUM(amount) as total_sales, AVG(amount) as avg_saleFROM ordersGROUP BY product_id;
-- Incremental refresh applies deltas efficientlyREFRESH MATERIALIZED VIEW sales_summary;Filter Views
Apply filter predicates to deltas:
CREATE MATERIALIZED VIEW active_users ASSELECT * FROM usersWHERE status = 'active' AND last_login > NOW() - INTERVAL '30 days';
-- Only processes changed rowsREFRESH MATERIALIZED VIEW active_users;Join Views
Match deltas against join partners:
CREATE MATERIALIZED VIEW user_orders ASSELECT u.name, o.order_id, o.amountFROM users uJOIN orders o ON u.user_id = o.user_id;
-- Processes deltas from both sides efficientlyREFRESH MATERIALIZED VIEW user_orders;Architecture
Delta Tracking System
┌──────────────┐│ Base Table ││ (users) │└──────┬───────┘ │ Changes (Insert/Update/Delete) ▼┌──────────────────────┐│ Delta Tracker ││ ││ ┌────────────────┐ ││ │ Delta Queue │ ││ │ - Insert ops │ ││ │ - Update ops │ ││ │ - Delete ops │ ││ └────────────────┘ │└──────┬───────────────┘ │ Get deltas since last refresh ▼┌──────────────────────┐│ Incremental ││ Refresher ││ ││ ┌────────────────┐ ││ │ Apply Deltas │ ││ │ - Aggregates │ ││ │ - Filters │ ││ │ - Joins │ ││ └────────────────┘ │└──────┬───────────────┘ │ Updated results ▼┌──────────────────────┐│ Materialized View ││ (refreshed) │└──────────────────────┘Cost Estimation Algorithm
// Heuristic-based cost modelfn estimate_cost(&self, mv: &MV) -> Cost { let delta_count = tracker.count_deltas(mv.base_tables); let mv_size = count_tuples(mv.name); let base_size = count_tuples(mv.base_tables[0]);
// Cost factors (configurable) let incremental = delta_count * DELTA_COST; // 1ms per delta let full = base_size * BASE_SCAN_COST; // 10ms per row
if incremental < full * 0.5 { Recommend::Incremental } else { Recommend::Full }}Performance Characteristics
Incremental vs Full Refresh
| Dataset Size | Delta Ratio | Incremental Time | Full Refresh Time | Speedup |
|---|---|---|---|---|
| 10K rows | 1% | 10ms | 100ms | 10x |
| 100K rows | 1% | 100ms | 1,000ms | 10x |
| 1M rows | 1% | 1s | 10s | 10x |
| 10K rows | 50% | 500ms | 100ms | 0.2x |
Key Insights:
- Incremental refresh excels when delta ratio is low (<10%)
- Full refresh is better when most of the dataset changes
- Cost estimation automatically selects the optimal strategy
Memory Usage
- Delta Tracking: O(deltas) - Only stores changes since last refresh
- Incremental Refresh: O(deltas + groups) - Proportional to changes
- Full Refresh: O(base_table_size) - Must process entire dataset
API Reference
DeltaTracker
pub struct DeltaTracker { // Track changes to base tables}
impl DeltaTracker { pub fn new(storage: Arc<StorageEngine>) -> Self;
pub fn record_insert(&self, table: &str, tuple: Tuple, ts: u64); pub fn record_update(&self, table: &str, old: Tuple, new: Tuple, ts: u64); pub fn record_delete(&self, table: &str, tuple: Tuple, ts: u64);
pub fn get_deltas_since(&self, table: &str, since: u64) -> Vec<Delta>; pub fn count_deltas_since(&self, tables: &[String], since: u64) -> Result<usize>;
pub fn clear_deltas_until(&self, table: &str, until: u64); pub fn clear_all_deltas(&self, table: &str);}IncrementalRefresher
pub struct IncrementalRefresher { // Performs incremental refresh operations}
impl IncrementalRefresher { pub fn new( storage: Arc<StorageEngine>, delta_tracker: Arc<DeltaTracker> ) -> Self;
pub fn refresh_incremental(&self, mv_name: &str) -> Result<RefreshResult>;
pub fn can_refresh_incrementally(&self, mv: &MVMetadata) -> Result<bool>;
pub fn estimate_refresh_cost(&self, mv: &MVMetadata) -> Result<RefreshCost>;}RefreshResult
pub struct RefreshResult { pub strategy_used: RefreshStrategy, pub rows_inserted: usize, pub rows_updated: usize, pub rows_deleted: usize, pub duration: Duration,}Usage Examples
Basic Incremental Refresh
use heliosdb_nano::storage::{IncrementalRefresher, DeltaTracker};
// Setuplet storage = Arc::new(StorageEngine::open("data", &config)?);let tracker = Arc::new(DeltaTracker::new(Arc::clone(&storage)));let refresher = IncrementalRefresher::new(storage, tracker);
// Track changestracker.record_insert("orders", new_order, timestamp);tracker.record_update("orders", old, new, timestamp);
// Refresh incrementallylet result = refresher.refresh_incremental("sales_summary")?;
println!("Refresh completed in {:?}", result.duration);println!(" Inserted: {} rows", result.rows_inserted);println!(" Updated: {} rows", result.rows_updated);println!(" Deleted: {} rows", result.rows_deleted);Cost-Based Refresh
// Estimate cost before refreshlet cost = refresher.estimate_refresh_cost(&mv_metadata)?;
match cost.recommendation { RefreshStrategy::Incremental => { println!("Using incremental refresh ({}s vs {}s)", cost.incremental_cost, cost.full_cost); refresher.refresh_incremental(mv_name)?; } RefreshStrategy::Full => { println!("Using full refresh (cheaper than incremental)"); // Use standard REFRESH MATERIALIZED VIEW } RefreshStrategy::Hybrid => { // Try incremental, fallback to full if needed }}Automatic Scheduling
use heliosdb_nano::storage::MVScheduler;
// Configure automatic refresh with incremental computationlet mut scheduler = MVScheduler::new(storage, SchedulerConfig { enabled: true, check_interval_seconds: 60, use_incremental: true, // Enable incremental refresh ..Default::default()});
scheduler.start()?;
// Scheduler automatically:// 1. Checks staleness of materialized views// 2. Estimates cost (incremental vs full)// 3. Selects optimal strategy// 4. Refreshes views in priority orderLimitations
Current Limitations (v2.3.0)
- MIN/MAX Aggregates: Require validation when deleted value is extremum
- Complex Joins: Multi-way joins not fully optimized
- Subqueries: Not yet supported for incremental refresh
- Window Functions: Not supported incrementally
Workarounds
For unsupported operations, the system automatically falls back to full refresh:
if !refresher.can_refresh_incrementally(&mv)? { // Automatically use full refresh executor.execute_refresh_full(mv_name)?;}Configuration
Delta Tracker Settings
// Configure delta retention policytracker.set_retention_policy(RetentionPolicy { max_age_seconds: 3600, // Keep deltas for 1 hour max_deltas_per_table: 100000, // Max 100K deltas per table auto_compact: true, // Compact old deltas});Cost Estimation Tuning
// Adjust cost model parametersrefresher.set_cost_parameters(CostParameters { delta_cost_ms: 1.0, // 1ms per delta operation base_scan_cost_ms: 10.0, // 10ms per base table row threshold_ratio: 0.5, // Recommend incremental if <50% of full cost});Best Practices
1. Regular Cleanup
Clear old deltas to prevent unbounded memory growth:
// Clear deltas after successful refreshlet last_refresh_ts = mv_metadata.last_refresh.timestamp();tracker.clear_deltas_until(&table_name, last_refresh_ts);2. Monitor Delta Growth
Track delta accumulation to prevent performance degradation:
let delta_count = tracker.count_deltas_since(&tables, last_refresh)?;if delta_count > THRESHOLD { warn!("High delta count: {}, consider full refresh", delta_count);}3. Validate Correctness
Periodically validate incremental results against full refresh:
#[cfg(test)]fn validate_incremental_correctness() { let incremental_result = refresher.refresh_incremental(mv_name)?; let full_result = executor.refresh_full(mv_name)?;
assert_eq!(incremental_result.row_count, full_result.row_count); // Compare actual data...}4. Choose Appropriate Refresh Interval
Balance freshness vs overhead:
// High-frequency updates -> longer intervalsscheduler.set_interval(mv_name, Duration::from_secs(300)); // 5 min
// Low-frequency updates -> shorter intervalsscheduler.set_interval(mv_name, Duration::from_secs(60)); // 1 minTesting
Unit Tests
cargo test mv_incremental --libIntegration Tests
cargo test mv_incremental_test --test mv_incremental_testBenchmarks
cargo bench mv_incremental_benchExpected results:
- Incremental refresh: 10-100x faster for small delta ratios
- Cost estimation: <1ms overhead
- Delta tracking: O(1) per operation
Future Enhancements
Planned for v2.4.0
- Materialized View Dependencies: Cascade incremental refresh
- Parallel Delta Processing: Multi-threaded delta application
- Delta Compression: Reduce memory footprint
- Smart Batching: Group deltas for better cache locality
Under Research
- Streaming Incremental Refresh: Real-time view updates
- Approximate Incremental Refresh: Trade accuracy for speed
- ML-Based Cost Models: Learned cost estimation
References
Changelog
v3.4.0 (2025-12-25)
- Added
REFRESH MATERIALIZED VIEW ... INCREMENTALLYSQL syntax - Added
ALTER MATERIALIZED VIEW ... SETfor post-creation configuration - Integrated AutoRefreshWorker with incremental refresh system
- Added CPU-aware scheduling and priority queue
- Improved delta tracking cleanup after refresh
- Comprehensive integration tests for new features
v2.3.0 (2025-01-24)
- Initial implementation of incremental computation
- Delta tracking system
- Cost-based strategy selection
- Support for aggregates, filters, and joins
- Comprehensive test suite and benchmarks