Skip to content

F2.15 Advanced Workload Management - Quick Start Guide

F2.15 Advanced Workload Management - Quick Start Guide

Overview

This guide shows you how to quickly get started with HeliosDB’s Advanced Workload Management system (F2.15), which provides intelligent query scheduling, resource quotas, admission control, and SLA enforcement.

5-Minute Quick Start

1. Add Dependency

[dependencies]
heliosdb-workload = { path = "../heliosdb-workload" }
tokio = { version = "1", features = ["full"] }
uuid = "1.0"

2. Initialize the System

use heliosdb_workload::*;
use std::time::Instant;
#[tokio::main]
async fn main() -> Result<()> {
// Create workload manager with defaults
let manager = AdvancedWorkloadManager::new();
manager.initialize()?;
println!("Workload manager ready!");
Ok(())
}

3. Set System Load

// Update current system state
manager.update_system_load(SystemLoad {
cpu_utilization: 0.5, // 50% CPU
memory_utilization: 0.6, // 60% memory
queue_depth: 100, // 100 queries queued
running_queries: 50, // 50 queries running
last_update: Instant::now(),
});

4. Configure a Tenant

// Set SLA
manager.set_tenant_sla(SLADefinition {
tenant_id: "my_tenant".to_string(),
tier: SLATier::Premium,
custom_p95_latency_us: Some(100_000), // 100ms
custom_p99_latency_us: Some(200_000), // 200ms
enable_auto_remediation: true,
penalty_per_violation: 100.0,
..Default::default()
});
// Set resource quota
manager.set_tenant_quota("my_tenant".to_string(), ResourceQuota {
tenant_id: "my_tenant".to_string(),
max_cpu_cores: 4.0,
max_memory_bytes: 4 * 1024 * 1024 * 1024, // 4GB
max_iops: 10_000,
max_network_bps: 100 * 1024 * 1024, // 100MB/s
soft_limits: false,
reset_period: Duration::from_secs(60),
});

5. Submit and Execute Queries

use uuid::Uuid;
// Create a query
let query = ScheduledQuery {
id: Uuid::new_v4().to_string(),
tenant_id: "my_tenant".to_string(),
priority: QueryPriority::Normal,
pattern: WorkloadPattern::OLTP,
estimated_time_us: 100_000,
estimated_resources: scheduler::ResourceEstimate {
cpu_cores: 1.0,
memory_bytes: 256 * 1024 * 1024,
io_operations: 100,
network_bytes: 1024,
},
sla_deadline: None,
submit_time: Instant::now(),
start_time: None,
state: QueryState::Queued,
preemption_count: 0,
age_boost: 0.0,
};
// Submit query
match manager.submit_query(query).await {
Ok(query_id) => println!("Query {} admitted", query_id),
Err(e) => println!("Query rejected: {}", e),
}
// Execute next query
if let Ok(Some(query)) = manager.execute_next_query().await {
println!("Executing query {}", query.id);
// ... run your query execution logic ...
// Mark complete
manager.complete_query(
&query.id,
&query.tenant_id,
50_000, // actual latency
true, // success
&query.estimated_resources,
)?;
}

6. Monitor Performance

// Get comprehensive stats
let stats = manager.get_system_stats();
println!("Scheduler: {} scheduled, {} completed",
stats.scheduler_stats.total_scheduled,
stats.scheduler_stats.total_completed);
println!("Admission: {} admitted, {} rejected",
stats.admission_stats.total_admitted,
stats.admission_stats.total_rejected);
println!("SLA: {:.1}% avg compliance",
stats.sla_stats.avg_compliance_percent);
// Get tenant compliance report
if let Some(report) = manager.get_compliance_report("my_tenant") {
println!("Tenant compliance: {:.2}%", report.compliance_percent);
println!("P95 latency: {}us", report.current_metrics.p95_latency_us);
println!("Status: {}", if report.is_compliant { "COMPLIANT" } else { "VIOLATION" });
}
// Get resource utilization
let util = manager.resource_manager().get_utilization_report("my_tenant");
println!("CPU: {:.1}%, Memory: {:.1}%",
util.cpu_utilization, util.memory_utilization);

Common Patterns

Pattern 1: High-Priority Query

// Submit critical query that bypasses most admission checks
let critical_query = ScheduledQuery {
priority: QueryPriority::Critical,
sla_deadline: Some(Instant::now() + Duration::from_millis(100)),
..query
};
manager.submit_query(critical_query).await?;

Pattern 2: Background Batch Job

// Submit low-priority batch query
let batch_query = ScheduledQuery {
priority: QueryPriority::Background,
estimated_time_us: 5_000_000, // 5 seconds
..query
};
manager.submit_query(batch_query).await?;

Pattern 3: Handle Overload

// Update system to overloaded state
manager.update_system_load(SystemLoad {
cpu_utilization: 0.95,
memory_utilization: 0.95,
queue_depth: 15_000,
running_queries: 1_500,
last_update: Instant::now(),
});
// Only high/critical priority queries will be admitted
let normal_query = ScheduledQuery {
priority: QueryPriority::Normal,
..query
};
// Will be rejected with RejectOverload
match manager.submit_query(normal_query).await {
Err(e) if e.to_string().contains("overload") => {
println!("System overloaded, retry later");
}
_ => {}
}

Pattern 4: Multi-Tenant Fair Sharing

// Set up multiple tenants
for i in 1..=10 {
let tenant_id = format!("tenant{}", i);
manager.set_tenant_quota(tenant_id.clone(), ResourceQuota {
tenant_id: tenant_id.clone(),
max_cpu_cores: 2.0, // Fair share
max_memory_bytes: 2 * 1024 * 1024 * 1024,
..Default::default()
});
}
// Submit queries from different tenants
for i in 1..=10 {
let query = ScheduledQuery {
tenant_id: format!("tenant{}", i),
priority: QueryPriority::Normal,
..query
};
manager.submit_query(query).await?;
}

Priority Levels Explained

PriorityUse CaseBehavior Under Load
CriticalSLA-bound, time-sensitiveAlways admitted (except extreme overload)
HighImportant operationsAdmitted during normal/high load
NormalRegular queriesAdmitted during normal load
LowBatch processingAdmitted when system idle
BackgroundMaintenance, analyticsLowest priority, first to be rejected

SLA Tiers

TierAvailabilityP95 LatencyP99 LatencyUse Case
Enterprise99.99%50ms200msMission-critical
Premium99.9%100ms500msProduction workloads
Standard99%500ms2sRegular operations
Basic95%1s5sDevelopment, testing

Load States

StateCriteriaBehavior
NormalCPU<85%, Mem<90%All queries admitted (quota-based)
HighLoad1+ resource >thresholdBackground queries rejected
Overloaded2+ resources >thresholdOnly High/Critical admitted
Critical2+ resources >95%Only Critical admitted

Configuration Options

Scheduler Config

SchedulerConfig {
max_concurrent_queries: 1000, // Max running queries
max_queue_size: 10_000, // Max queued per priority
enable_preemption: true, // Allow query preemption
starvation_timeout_secs: 30, // Boost after this time
scheduling_overhead_budget_us: 10_000, // Target <10ms
}

Resource Manager Config

ResourceManagerConfig {
default_quota: ResourceQuota::default(),
strict_enforcement: true, // Hard limits vs soft
enable_dynamic_quotas: false, // Auto-adjust quotas
sample_interval: Duration::from_secs(5),
}

Admission Control Config

AdmissionControlConfig {
enabled: true,
load_thresholds: LoadThresholds::default(),
max_query_cost_us: 60_000_000, // 60 seconds
enable_cost_based_admission: true,
enable_tenant_throttling: true,
throttle_duration_secs: 10,
circuit_breaker_threshold: 100,
circuit_breaker_timeout_secs: 30,
}

SLA Manager Config

SLAManagerConfig {
metrics_window_duration: Duration::from_secs(60),
violation_threshold: 3, // Consecutive windows
enable_auto_remediation: true,
max_history_size: 1000,
}

Running the Example

Terminal window
# Run the full demo
cd heliosdb-workload
cargo run --example workload_management
# Run tests
cargo test
# Run benchmarks
cargo bench --bench workload_benchmark

Troubleshooting

Query Rejected - Overload

Problem: Normal/Low priority queries rejected Solution: System is overloaded. Either:

  1. Submit as High/Critical priority
  2. Wait for load to decrease
  3. Increase system capacity

Query Rejected - Quota

Problem: Tenant exceeded resource quota Solution: Either:

  1. Increase tenant quota
  2. Release resources from other queries
  3. Wait for quota reset period

Query Rejected - Cost

Problem: Query estimated cost too high Solution: Either:

  1. Optimize query
  2. Increase max_query_cost_us
  3. Split into smaller queries

Low SLA Compliance

Problem: Tenant not meeting SLA targets Solution:

  1. Check P95/P99 latency - optimize slow queries
  2. Check availability - investigate failures
  3. Check throughput - increase resources
  4. Enable auto-remediation

Performance Tips

  1. Batch submissions: Use async to submit multiple queries concurrently
  2. Monitor overhead: Track avg_scheduling_overhead_us - should be <10ms
  3. Tune queue sizes: Balance memory vs queue depth
  4. Circuit breakers: Protect system from failing tenants
  5. Resource estimations: More accurate estimates = better scheduling

Next Steps

  • Review full implementation in /heliosdb-workload/src/
  • Check integration tests in /heliosdb-workload/tests/
  • Run benchmarks to understand performance characteristics
  • Customize configs for your workload patterns
  • Integrate with your query execution engine

Support

For issues or questions:

  • Check implementation summary: F2.15_WORKLOAD_MANAGEMENT_SUMMARY.md
  • Review examples: heliosdb-workload/examples/
  • Run tests: cargo test --package heliosdb-workload

Ready to Scale!

This workload management system supports:

  • 100K+ concurrent queries
  • <10ms scheduling overhead
  • 95%+ SLA compliance
  • Multi-tenant fair sharing
  • Intelligent admission control