HeliosDB Workload Management User Guide
HeliosDB Workload Management User Guide
Version: 1.0 Last Updated: January 4, 2026 Feature Status: Production Ready (100%)
Table of Contents
- Overview
- Getting Started
- Workload Profiles
- Query Priorities
- Admission Control
- Resource Management
- SLA Enforcement
- Graceful Degradation
- Monitoring and Metrics
- Multi-Tenant Configuration
- Performance Tuning
- Troubleshooting
- Best Practices
- API Reference
Overview
HeliosDB Workload Management provides enterprise-grade control over query execution, resource allocation, and service level guarantees. It ensures your database remains responsive and stable under varying load conditions.
Key Components
| Component | Purpose |
|---|---|
| Workload Profiles | Define resource limits and priorities for query groups |
| Admission Control | Gate queries based on system capacity |
| Query Scheduler | Priority-based query execution ordering |
| Resource Manager | Allocate CPU, memory, and I/O resources |
| SLA Manager | Enforce response time guarantees |
| Graceful Degradation | Maintain service during overload |
Key Benefits
- Predictable Performance: Guaranteed response times for critical queries
- Resource Isolation: Prevent runaway queries from affecting others
- Fair Scheduling: Ensure all tenants get their allocated resources
- Overload Protection: Automatic load shedding under stress
- Observability: Comprehensive metrics and monitoring
Performance Characteristics
| Metric | Typical Value |
|---|---|
| Admission check overhead | <10us per query |
| Scheduling decision | <50us |
| Resource enforcement | <1% CPU overhead |
| SLA tracking accuracy | 99.9% |
Getting Started
Prerequisites
- HeliosDB v7.0 or later
- Workload management extension enabled
- Admin privileges for configuration
Quick Start (5 minutes)
1. Enable Workload Management
-- Enable workload management extensionCREATE EXTENSION IF NOT EXISTS heliosdb_workload_management;
-- Verify it's enabledSELECT * FROM pg_extension WHERE extname = 'heliosdb_workload_management';2. Create a Workload Profile
-- Create profile for critical OLTP workloadCREATE WORKLOAD PROFILE critical_oltpWITH ( max_memory = '4GB', max_cpu_percent = 50, max_execution_time = '30s', priority = 100, queue = 'high_priority');
-- Create profile for analytics workloadCREATE WORKLOAD PROFILE analyticsWITH ( max_memory = '16GB', max_cpu_percent = 80, max_execution_time = '10m', priority = 50, queue = 'batch');3. Assign Sessions to Profiles
-- Assign current session to a profileSET workload_profile = 'critical_oltp';
-- Verify assignmentSHOW workload_profile;4. Execute Queries with Resource Control
-- This query runs with critical_oltp profile limitsSELECT * FROM ordersWHERE customer_id = 12345 AND created_at > NOW() - INTERVAL '30 days';Workload Profiles
Creating Profiles
-- Basic profileCREATE WORKLOAD PROFILE my_profileWITH ( priority = 75);
-- Full profile with all optionsCREATE WORKLOAD PROFILE production_appWITH ( -- Resource limits max_memory = '8GB', max_cpu_percent = 40, max_io_bandwidth = '200MB/s', max_temp_space = '10GB',
-- Execution limits max_execution_time = '60s', max_rows_scanned = 10000000, max_rows_returned = 100000,
-- Scheduling priority = 100, queue = 'critical', max_concurrent_queries = 50,
-- Behavior query_timeout_action = 'abort', on_resource_exceeded = 'queue');Profile Options Reference
| Option | Type | Description | Default |
|---|---|---|---|
max_memory | Size | Maximum memory per query | Unlimited |
max_cpu_percent | Integer | CPU limit (1-100) | 100 |
max_io_bandwidth | Size/s | I/O bandwidth limit | Unlimited |
max_temp_space | Size | Temp storage limit | Unlimited |
max_execution_time | Interval | Query timeout | 24h |
max_rows_scanned | Integer | Scan limit | Unlimited |
max_rows_returned | Integer | Result limit | Unlimited |
priority | Integer | Scheduling priority (1-100) | 50 |
queue | String | Queue name | ’default’ |
max_concurrent_queries | Integer | Parallel query limit | Unlimited |
Modifying Profiles
-- Update profile settingsALTER WORKLOAD PROFILE analyticsSET max_memory = '32GB', max_execution_time = '30m';
-- View profile settingsSELECT * FROM workload_profilesWHERE name = 'analytics';Deleting Profiles
-- Delete profile (fails if in use)DROP WORKLOAD PROFILE my_profile;
-- Force delete (kills active queries using this profile)DROP WORKLOAD PROFILE my_profile CASCADE;Built-in Profiles
HeliosDB includes these default profiles:
| Profile | Priority | Max Memory | Timeout | Use Case |
|---|---|---|---|---|
default | 50 | 4GB | 60s | Standard queries |
interactive | 80 | 2GB | 10s | User-facing queries |
batch | 30 | 16GB | 30m | Background jobs |
admin | 100 | 8GB | 5m | Admin operations |
Query Priorities
Priority Levels
| Level | Value | Description | Use Case |
|---|---|---|---|
| Critical | 100 | Highest priority | Health checks, monitoring |
| High | 80 | User-facing priority | API requests |
| Normal | 50 | Default | Standard workload |
| Low | 30 | Background work | Reports, analytics |
| Background | 10 | Lowest priority | Maintenance, cleanup |
Setting Query Priority
-- Via session variableSET query_priority = 'high';SELECT * FROM orders;
-- Via query hintSELECT /*+ priority(high) */ * FROM orders;
-- Via workload profileCREATE WORKLOAD PROFILE api_queriesWITH (priority = 80);SET workload_profile = 'api_queries';Priority-Based Execution Order
-- Higher priority queries execute first-- In this scenario:-- Query A (priority=30) starts first-- Query B (priority=80) arrives, preempts A-- Query B completes-- Query A resumes
-- View execution orderSELECT query_id, priority, status, wait_time_ms, execution_time_msFROM query_execution_logORDER BY start_time DESC;Admission Control
Admission control gates queries based on system capacity and configured limits.
Configuration
-- Enable admission controlSET admission_control_enabled = true;
-- Configure limitsSET max_concurrent_queries = 100;SET max_queue_size = 500;SET queue_timeout = '30s';Admission Policies
-- Create admission policyCREATE ADMISSION POLICY peak_hoursWITH ( applies_during = '09:00-17:00', max_concurrent_queries = 200, reject_low_priority = true, queue_high_priority = true);
-- Create off-peak policyCREATE ADMISSION POLICY off_peakWITH ( applies_during = '17:00-09:00', max_concurrent_queries = 500, reject_low_priority = false);Queue Management
-- View current queueSELECT query_id, priority, queue_position, wait_time_ms, estimated_start_timeFROM workload_queueORDER BY queue_position;
-- Manually promote query in queueSELECT promote_query_priority('query_id_123', 'critical');
-- Cancel queued querySELECT cancel_queued_query('query_id_123');Handling Query Rejection
When admission control rejects a query:
-- Application should retry with backoffDO $$DECLARE retry_count INT := 0; max_retries INT := 3;BEGIN LOOP BEGIN -- Try the query EXECUTE 'SELECT * FROM large_table'; EXIT; -- Success, exit loop
EXCEPTION WHEN SQLSTATE '57014' THEN -- Query cancelled (admission rejected) retry_count := retry_count + 1; IF retry_count >= max_retries THEN RAISE EXCEPTION 'Query rejected after % retries', max_retries; END IF; -- Exponential backoff PERFORM pg_sleep(power(2, retry_count) * 0.1); END; END LOOP;END $$;Resource Management
CPU Resource Control
-- Limit CPU per queryCREATE WORKLOAD PROFILE cpu_limitedWITH ( max_cpu_percent = 25, -- Max 25% of available CPU max_cpu_time = '10s' -- Max 10s of CPU time);
-- Monitor CPU usageSELECT query_id, cpu_time_ms, cpu_percent_avg, cpu_percent_peakFROM query_resource_usageWHERE start_time > NOW() - INTERVAL '1 hour';Memory Resource Control
-- Limit memory per queryCREATE WORKLOAD PROFILE memory_limitedWITH ( max_memory = '2GB', spill_to_disk = true, -- Spill to temp storage if exceeded temp_space_limit = '10GB');
-- Monitor memory usageSELECT query_id, memory_allocated_mb, memory_peak_mb, spill_count, spill_bytesFROM query_resource_usageWHERE memory_peak_mb > 1000;I/O Resource Control
-- Limit I/O bandwidthCREATE WORKLOAD PROFILE io_limitedWITH ( max_io_bandwidth = '100MB/s', max_io_ops = 1000 -- IOPS limit);
-- Monitor I/O usageSELECT query_id, bytes_read, bytes_written, io_ops_totalFROM query_io_stats;Resource Pools
Create isolated resource pools for different workloads:
-- Create resource poolsCREATE RESOURCE POOL oltp_poolWITH ( cpu_shares = 60, memory_limit = '64GB', io_priority = 'high');
CREATE RESOURCE POOL analytics_poolWITH ( cpu_shares = 40, memory_limit = '128GB', io_priority = 'normal');
-- Assign profiles to poolsALTER WORKLOAD PROFILE critical_oltpSET resource_pool = 'oltp_pool';
ALTER WORKLOAD PROFILE analyticsSET resource_pool = 'analytics_pool';SLA Enforcement
Defining SLAs
-- Create SLA definitionCREATE SLA web_api_slaWITH ( p50_latency_ms = 50, -- 50th percentile < 50ms p95_latency_ms = 200, -- 95th percentile < 200ms p99_latency_ms = 500, -- 99th percentile < 500ms availability = 99.9, -- 99.9% availability error_rate = 0.1 -- < 0.1% errors);
-- Associate SLA with workload profileALTER WORKLOAD PROFILE critical_oltpSET sla = 'web_api_sla';SLA Monitoring
-- View SLA complianceSELECT sla_name, current_p50_ms, current_p95_ms, current_p99_ms, compliance_pct, queries_in_violationFROM sla_status;
-- View SLA historySELECT sla_name, period_start, period_end, p50_ms, p95_ms, p99_ms, compliance_pctFROM sla_historyWHERE sla_name = 'web_api_sla'ORDER BY period_start DESCLIMIT 24; -- Last 24 periodsSLA Actions
Configure automatic actions when SLA is at risk:
-- Create SLA action policyCREATE SLA POLICY web_api_enforcementFOR SLA web_api_slaWITH ( -- When approaching SLA violation warning_threshold = 80, -- 80% of limit warning_action = 'alert',
-- When SLA is violated violation_action = 'prioritize', -- Boost priority
-- Escalation escalate_after = '5 minutes', escalation_action = 'kill_low_priority');Graceful Degradation
Degradation Modes
HeliosDB automatically adjusts behavior under load:
| Mode | CPU Threshold | Memory Threshold | Behavior |
|---|---|---|---|
| Normal | <80% | <85% | Full capacity |
| Cautious | 80-90% | 85-92% | 80% parallelism, extended timeouts |
| Restricted | 90-95% | 92-97% | 50% parallelism, low priority rejected |
| Survival | >95% | >97% | 20% parallelism, critical only |
Configuration
-- Configure degradation thresholdsSET degradation.cpu_cautious = 0.80;SET degradation.cpu_restricted = 0.90;SET degradation.cpu_survival = 0.95;
SET degradation.memory_cautious = 0.85;SET degradation.memory_restricted = 0.92;SET degradation.memory_survival = 0.97;
-- Configure hysteresis (prevents mode thrashing)SET degradation.hysteresis_factor = 0.85;SET degradation.min_mode_duration = '30 seconds';Monitoring Degradation
-- Current degradation statusSELECT * FROM degradation_status;
-- Result:-- mode: Normal-- cpu_utilization: 0.65-- memory_utilization: 0.72-- parallelism_factor: 1.0-- timeout_multiplier: 1.0-- min_priority_accepted: Background-- time_in_mode: 00:15:32
-- View degradation historySELECT mode, started_at, ended_at, duration, trigger_metric, queries_rejectedFROM degradation_historyORDER BY started_at DESCLIMIT 10;Mode Transition Alerts
-- Create alert for degradation mode changesCREATE ALERT degradation_warningON degradation_mode_changeWHEN new_mode IN ('Restricted', 'Survival')ACTION notify_channel('ops-alerts');Monitoring and Metrics
Real-Time Statistics
-- Current workload overviewSELECT * FROM workload_summary;
-- Result:-- active_queries: 45-- queued_queries: 12-- blocked_queries: 3-- avg_response_time_ms: 125-- queries_per_second: 847-- cpu_utilization: 0.68-- memory_utilization: 0.72-- degradation_mode: NormalQuery Statistics
-- Recent query statistics by profileSELECT workload_profile, count(*) as query_count, avg(execution_time_ms) as avg_time_ms, percentile_cont(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_ms, sum(rows_scanned) as total_rows_scanned, sum(memory_mb) as total_memory_mbFROM query_statsWHERE start_time > NOW() - INTERVAL '1 hour'GROUP BY workload_profileORDER BY query_count DESC;Resource Usage by Profile
-- Resource consumption by workload profileSELECT profile_name, cpu_time_total_sec, memory_peak_mb, io_bytes_read, io_bytes_written, queries_completed, queries_timeout, queries_cancelledFROM workload_profile_statsWHERE time_period = 'last_hour';Queue Statistics
-- Queue performanceSELECT queue_name, current_depth, avg_wait_time_ms, max_wait_time_ms, queries_admitted, queries_rejected, queries_timeoutFROM queue_stats;System Views
| View | Description |
|---|---|
workload_profiles | All workload profiles and settings |
workload_summary | Current system overview |
query_stats | Per-query execution statistics |
queue_stats | Query queue statistics |
sla_status | Current SLA compliance |
degradation_status | Current degradation mode |
resource_pool_stats | Resource pool utilization |
Multi-Tenant Configuration
Per-Tenant Profiles
-- Create tenant-specific profilesCREATE WORKLOAD PROFILE tenant_acmeWITH ( max_memory = '8GB', max_cpu_percent = 25, max_concurrent_queries = 50, priority = 70);
CREATE WORKLOAD PROFILE tenant_globexWITH ( max_memory = '16GB', max_cpu_percent = 40, max_concurrent_queries = 100, priority = 80);
-- Associate tenants with profilesCREATE ROLE tenant_acme_user;ALTER ROLE tenant_acme_user SET workload_profile = 'tenant_acme';
CREATE ROLE tenant_globex_user;ALTER ROLE tenant_globex_user SET workload_profile = 'tenant_globex';Tenant Isolation
-- Create isolated resource pools per tenantCREATE RESOURCE POOL tenant_acme_poolWITH ( cpu_shares = 25, memory_limit = '32GB', io_priority = 'normal');
ALTER WORKLOAD PROFILE tenant_acmeSET resource_pool = 'tenant_acme_pool';Tenant Usage Monitoring
-- Per-tenant resource usageSELECT tenant_id, total_queries, total_cpu_time_sec, total_memory_mb, avg_response_time_ms, sla_compliance_pctFROM tenant_usage_statsWHERE time_period = 'last_day'ORDER BY total_queries DESC;Performance Tuning
Optimizing Queue Settings
-- High-throughput configurationSET max_concurrent_queries = 500;SET max_queue_size = 2000;SET queue_timeout = '10s';
-- Low-latency configurationSET max_concurrent_queries = 100;SET max_queue_size = 500;SET queue_timeout = '5s';Optimizing Resource Allocation
-- For OLTP workloadsCREATE WORKLOAD PROFILE fast_oltpWITH ( max_memory = '512MB', -- Small memory for quick queries max_execution_time = '5s', -- Short timeout priority = 90, -- High priority max_concurrent_queries = 200);
-- For analytics workloadsCREATE WORKLOAD PROFILE heavy_analyticsWITH ( max_memory = '64GB', -- Large memory for complex queries max_execution_time = '30m', -- Long timeout priority = 30, -- Lower priority max_concurrent_queries = 10);Tuning Degradation Thresholds
-- Conservative (prioritize stability)SET degradation.cpu_cautious = 0.70;SET degradation.cpu_restricted = 0.85;SET degradation.hysteresis_factor = 0.90;SET degradation.min_mode_duration = '60 seconds';
-- Aggressive (maximize throughput)SET degradation.cpu_cautious = 0.85;SET degradation.cpu_restricted = 0.93;SET degradation.hysteresis_factor = 0.80;SET degradation.min_mode_duration = '15 seconds';Troubleshooting
Common Issues
Queries Stuck in Queue
-- Check queue statusSELECT * FROM workload_queue ORDER BY queue_position;
-- Check for blocking queriesSELECT blocking.query_id AS blocking_query, blocked.query_id AS blocked_query, blocking.execution_time_msFROM query_blocks blockingJOIN query_blocks blocked ON blocking.locks && blocked.waiting_for;
-- Force release a query slotSELECT release_query_slot('query_id_123');Resource Limit Exceeded
-- Find resource-heavy queriesSELECT query_id, query_text, memory_mb, cpu_time_ms, workload_profileFROM query_statsWHERE status = 'cancelled' AND cancel_reason = 'resource_limit_exceeded'ORDER BY start_time DESCLIMIT 10;
-- Increase limits if neededALTER WORKLOAD PROFILE my_profileSET max_memory = '16GB';SLA Violations
-- Find SLA-violating queriesSELECT query_id, query_text, execution_time_ms, sla_target_ms, ROUND((execution_time_ms - sla_target_ms) * 100.0 / sla_target_ms, 1) AS pct_overFROM sla_violationsWHERE violation_time > NOW() - INTERVAL '1 hour'ORDER BY pct_over DESC;
-- Check system load during violationsSELECT time_bucket('5 minutes', time) AS bucket, avg(cpu_utilization) AS avg_cpu, avg(memory_utilization) AS avg_memory, count(DISTINCT query_id) FILTER (WHERE is_sla_violation) AS violationsFROM system_metricsWHERE time > NOW() - INTERVAL '1 hour'GROUP BY bucketORDER BY bucket;Diagnostic Queries
-- Overall system health checkSELECT * FROM workload_health_check;
-- Find bottlenecksSELECT bottleneck_type, severity, affected_queries, recommendationFROM workload_bottleneck_analysis;
-- Profile configuration auditSELECT profile_name, config_issue, severity, recommendationFROM workload_profile_audit;Best Practices
1. Start Conservative, Tune Later
-- Start with reasonable limitsCREATE WORKLOAD PROFILE new_appWITH ( max_memory = '2GB', max_execution_time = '30s', priority = 50);
-- Monitor and adjust based on actual usageSELECT avg(memory_peak_mb) AS avg_memory, percentile_cont(0.99) WITHIN GROUP (ORDER BY memory_peak_mb) AS p99_memory, avg(execution_time_ms) AS avg_timeFROM query_statsWHERE workload_profile = 'new_app';
-- Adjust based on dataALTER WORKLOAD PROFILE new_appSET max_memory = '4GB'; -- Increase if needed2. Separate OLTP and OLAP Workloads
-- OLTP profile (fast, limited resources)CREATE WORKLOAD PROFILE oltp_workloadWITH ( max_memory = '512MB', max_execution_time = '5s', max_rows_returned = 10000, priority = 90, resource_pool = 'oltp_pool');
-- OLAP profile (slow, more resources)CREATE WORKLOAD PROFILE olap_workloadWITH ( max_memory = '32GB', max_execution_time = '30m', priority = 30, resource_pool = 'olap_pool');3. Define Clear Priority Hierarchy
-- Priority assignment strategy-- 100: Health checks, monitoring-- 90: Critical business operations-- 80: User-facing API queries-- 50: Standard application queries-- 30: Background batch jobs-- 10: Maintenance, analytics
-- Example implementationCREATE WORKLOAD PROFILE health_check WITH (priority = 100);CREATE WORKLOAD PROFILE critical_ops WITH (priority = 90);CREATE WORKLOAD PROFILE api_queries WITH (priority = 80);CREATE WORKLOAD PROFILE app_default WITH (priority = 50);CREATE WORKLOAD PROFILE batch_jobs WITH (priority = 30);CREATE WORKLOAD PROFILE maintenance WITH (priority = 10);4. Monitor and Alert
-- Create alerts for common issuesCREATE ALERT queue_depth_highON queue_depth > 100ACTION notify_channel('db-ops');
CREATE ALERT sla_violation_rateON sla_violation_rate > 0.01ACTION notify_channel('db-ops');
CREATE ALERT degradation_modeON degradation_mode IN ('Restricted', 'Survival')ACTION notify_channel('db-ops');5. Regular Review and Optimization
-- Weekly review querySELECT workload_profile, query_count, avg_execution_time_ms, p99_execution_time_ms, memory_peak_avg_mb, sla_compliance_pct, CASE WHEN avg_execution_time_ms > max_execution_time_ms * 0.8 THEN 'REVIEW' WHEN memory_peak_avg_mb > max_memory_mb * 0.9 THEN 'REVIEW' ELSE 'OK' END AS action_neededFROM workload_profile_weekly_summaryORDER BY query_count DESC;API Reference
SQL Commands
Workload Profiles
-- CreateCREATE WORKLOAD PROFILE name WITH (options);
-- ModifyALTER WORKLOAD PROFILE name SET option = value;
-- DeleteDROP WORKLOAD PROFILE name [CASCADE];
-- Assign to sessionSET workload_profile = 'name';Resource Pools
-- CreateCREATE RESOURCE POOL name WITH (options);
-- ModifyALTER RESOURCE POOL name SET option = value;
-- DeleteDROP RESOURCE POOL name [CASCADE];SLAs
-- CreateCREATE SLA name WITH (options);
-- ModifyALTER SLA name SET option = value;
-- DeleteDROP SLA name;Session Variables
| Variable | Description | Default |
|---|---|---|
workload_profile | Active workload profile | ’default’ |
query_priority | Query priority level | ’normal’ |
admission_control_enabled | Enable admission control | true |
max_concurrent_queries | System-wide query limit | 1000 |
System Functions
| Function | Description |
|---|---|
promote_query_priority(query_id, priority) | Boost query priority |
cancel_queued_query(query_id) | Cancel a queued query |
release_query_slot(query_id) | Force release query resources |
get_workload_status() | Get current workload summary |
get_degradation_mode() | Get current degradation mode |
Support: For issues or questions, contact workload-support@heliosdb.com
Related Documentation:
License: Enterprise license required for advanced workload management features.
Version: HeliosDB v7.0+ with Workload Management extension