Skip to content

HeliosDB Workload Management User Guide

HeliosDB Workload Management User Guide

Version: 1.0 Last Updated: January 4, 2026 Feature Status: Production Ready (100%)


Table of Contents

  1. Overview
  2. Getting Started
  3. Workload Profiles
  4. Query Priorities
  5. Admission Control
  6. Resource Management
  7. SLA Enforcement
  8. Graceful Degradation
  9. Monitoring and Metrics
  10. Multi-Tenant Configuration
  11. Performance Tuning
  12. Troubleshooting
  13. Best Practices
  14. API Reference

Overview

HeliosDB Workload Management provides enterprise-grade control over query execution, resource allocation, and service level guarantees. It ensures your database remains responsive and stable under varying load conditions.

Key Components

ComponentPurpose
Workload ProfilesDefine resource limits and priorities for query groups
Admission ControlGate queries based on system capacity
Query SchedulerPriority-based query execution ordering
Resource ManagerAllocate CPU, memory, and I/O resources
SLA ManagerEnforce response time guarantees
Graceful DegradationMaintain service during overload

Key Benefits

  • Predictable Performance: Guaranteed response times for critical queries
  • Resource Isolation: Prevent runaway queries from affecting others
  • Fair Scheduling: Ensure all tenants get their allocated resources
  • Overload Protection: Automatic load shedding under stress
  • Observability: Comprehensive metrics and monitoring

Performance Characteristics

MetricTypical Value
Admission check overhead<10us per query
Scheduling decision<50us
Resource enforcement<1% CPU overhead
SLA tracking accuracy99.9%

Getting Started

Prerequisites

  • HeliosDB v7.0 or later
  • Workload management extension enabled
  • Admin privileges for configuration

Quick Start (5 minutes)

1. Enable Workload Management

-- Enable workload management extension
CREATE EXTENSION IF NOT EXISTS heliosdb_workload_management;
-- Verify it's enabled
SELECT * FROM pg_extension WHERE extname = 'heliosdb_workload_management';

2. Create a Workload Profile

-- Create profile for critical OLTP workload
CREATE WORKLOAD PROFILE critical_oltp
WITH (
max_memory = '4GB',
max_cpu_percent = 50,
max_execution_time = '30s',
priority = 100,
queue = 'high_priority'
);
-- Create profile for analytics workload
CREATE WORKLOAD PROFILE analytics
WITH (
max_memory = '16GB',
max_cpu_percent = 80,
max_execution_time = '10m',
priority = 50,
queue = 'batch'
);

3. Assign Sessions to Profiles

-- Assign current session to a profile
SET workload_profile = 'critical_oltp';
-- Verify assignment
SHOW workload_profile;

4. Execute Queries with Resource Control

-- This query runs with critical_oltp profile limits
SELECT * FROM orders
WHERE customer_id = 12345
AND created_at > NOW() - INTERVAL '30 days';

Workload Profiles

Creating Profiles

-- Basic profile
CREATE WORKLOAD PROFILE my_profile
WITH (
priority = 75
);
-- Full profile with all options
CREATE WORKLOAD PROFILE production_app
WITH (
-- Resource limits
max_memory = '8GB',
max_cpu_percent = 40,
max_io_bandwidth = '200MB/s',
max_temp_space = '10GB',
-- Execution limits
max_execution_time = '60s',
max_rows_scanned = 10000000,
max_rows_returned = 100000,
-- Scheduling
priority = 100,
queue = 'critical',
max_concurrent_queries = 50,
-- Behavior
query_timeout_action = 'abort',
on_resource_exceeded = 'queue'
);

Profile Options Reference

OptionTypeDescriptionDefault
max_memorySizeMaximum memory per queryUnlimited
max_cpu_percentIntegerCPU limit (1-100)100
max_io_bandwidthSize/sI/O bandwidth limitUnlimited
max_temp_spaceSizeTemp storage limitUnlimited
max_execution_timeIntervalQuery timeout24h
max_rows_scannedIntegerScan limitUnlimited
max_rows_returnedIntegerResult limitUnlimited
priorityIntegerScheduling priority (1-100)50
queueStringQueue name’default’
max_concurrent_queriesIntegerParallel query limitUnlimited

Modifying Profiles

-- Update profile settings
ALTER WORKLOAD PROFILE analytics
SET max_memory = '32GB',
max_execution_time = '30m';
-- View profile settings
SELECT * FROM workload_profiles
WHERE name = 'analytics';

Deleting Profiles

-- Delete profile (fails if in use)
DROP WORKLOAD PROFILE my_profile;
-- Force delete (kills active queries using this profile)
DROP WORKLOAD PROFILE my_profile CASCADE;

Built-in Profiles

HeliosDB includes these default profiles:

ProfilePriorityMax MemoryTimeoutUse Case
default504GB60sStandard queries
interactive802GB10sUser-facing queries
batch3016GB30mBackground jobs
admin1008GB5mAdmin operations

Query Priorities

Priority Levels

LevelValueDescriptionUse Case
Critical100Highest priorityHealth checks, monitoring
High80User-facing priorityAPI requests
Normal50DefaultStandard workload
Low30Background workReports, analytics
Background10Lowest priorityMaintenance, cleanup

Setting Query Priority

-- Via session variable
SET query_priority = 'high';
SELECT * FROM orders;
-- Via query hint
SELECT /*+ priority(high) */ * FROM orders;
-- Via workload profile
CREATE WORKLOAD PROFILE api_queries
WITH (priority = 80);
SET workload_profile = 'api_queries';

Priority-Based Execution Order

-- Higher priority queries execute first
-- In this scenario:
-- Query A (priority=30) starts first
-- Query B (priority=80) arrives, preempts A
-- Query B completes
-- Query A resumes
-- View execution order
SELECT
query_id,
priority,
status,
wait_time_ms,
execution_time_ms
FROM query_execution_log
ORDER BY start_time DESC;

Admission Control

Admission control gates queries based on system capacity and configured limits.

Configuration

-- Enable admission control
SET admission_control_enabled = true;
-- Configure limits
SET max_concurrent_queries = 100;
SET max_queue_size = 500;
SET queue_timeout = '30s';

Admission Policies

-- Create admission policy
CREATE ADMISSION POLICY peak_hours
WITH (
applies_during = '09:00-17:00',
max_concurrent_queries = 200,
reject_low_priority = true,
queue_high_priority = true
);
-- Create off-peak policy
CREATE ADMISSION POLICY off_peak
WITH (
applies_during = '17:00-09:00',
max_concurrent_queries = 500,
reject_low_priority = false
);

Queue Management

-- View current queue
SELECT
query_id,
priority,
queue_position,
wait_time_ms,
estimated_start_time
FROM workload_queue
ORDER BY queue_position;
-- Manually promote query in queue
SELECT promote_query_priority('query_id_123', 'critical');
-- Cancel queued query
SELECT cancel_queued_query('query_id_123');

Handling Query Rejection

When admission control rejects a query:

-- Application should retry with backoff
DO $$
DECLARE
retry_count INT := 0;
max_retries INT := 3;
BEGIN
LOOP
BEGIN
-- Try the query
EXECUTE 'SELECT * FROM large_table';
EXIT; -- Success, exit loop
EXCEPTION
WHEN SQLSTATE '57014' THEN -- Query cancelled (admission rejected)
retry_count := retry_count + 1;
IF retry_count >= max_retries THEN
RAISE EXCEPTION 'Query rejected after % retries', max_retries;
END IF;
-- Exponential backoff
PERFORM pg_sleep(power(2, retry_count) * 0.1);
END;
END LOOP;
END $$;

Resource Management

CPU Resource Control

-- Limit CPU per query
CREATE WORKLOAD PROFILE cpu_limited
WITH (
max_cpu_percent = 25, -- Max 25% of available CPU
max_cpu_time = '10s' -- Max 10s of CPU time
);
-- Monitor CPU usage
SELECT
query_id,
cpu_time_ms,
cpu_percent_avg,
cpu_percent_peak
FROM query_resource_usage
WHERE start_time > NOW() - INTERVAL '1 hour';

Memory Resource Control

-- Limit memory per query
CREATE WORKLOAD PROFILE memory_limited
WITH (
max_memory = '2GB',
spill_to_disk = true, -- Spill to temp storage if exceeded
temp_space_limit = '10GB'
);
-- Monitor memory usage
SELECT
query_id,
memory_allocated_mb,
memory_peak_mb,
spill_count,
spill_bytes
FROM query_resource_usage
WHERE memory_peak_mb > 1000;

I/O Resource Control

-- Limit I/O bandwidth
CREATE WORKLOAD PROFILE io_limited
WITH (
max_io_bandwidth = '100MB/s',
max_io_ops = 1000 -- IOPS limit
);
-- Monitor I/O usage
SELECT
query_id,
bytes_read,
bytes_written,
io_ops_total
FROM query_io_stats;

Resource Pools

Create isolated resource pools for different workloads:

-- Create resource pools
CREATE RESOURCE POOL oltp_pool
WITH (
cpu_shares = 60,
memory_limit = '64GB',
io_priority = 'high'
);
CREATE RESOURCE POOL analytics_pool
WITH (
cpu_shares = 40,
memory_limit = '128GB',
io_priority = 'normal'
);
-- Assign profiles to pools
ALTER WORKLOAD PROFILE critical_oltp
SET resource_pool = 'oltp_pool';
ALTER WORKLOAD PROFILE analytics
SET resource_pool = 'analytics_pool';

SLA Enforcement

Defining SLAs

-- Create SLA definition
CREATE SLA web_api_sla
WITH (
p50_latency_ms = 50, -- 50th percentile < 50ms
p95_latency_ms = 200, -- 95th percentile < 200ms
p99_latency_ms = 500, -- 99th percentile < 500ms
availability = 99.9, -- 99.9% availability
error_rate = 0.1 -- < 0.1% errors
);
-- Associate SLA with workload profile
ALTER WORKLOAD PROFILE critical_oltp
SET sla = 'web_api_sla';

SLA Monitoring

-- View SLA compliance
SELECT
sla_name,
current_p50_ms,
current_p95_ms,
current_p99_ms,
compliance_pct,
queries_in_violation
FROM sla_status;
-- View SLA history
SELECT
sla_name,
period_start,
period_end,
p50_ms,
p95_ms,
p99_ms,
compliance_pct
FROM sla_history
WHERE sla_name = 'web_api_sla'
ORDER BY period_start DESC
LIMIT 24; -- Last 24 periods

SLA Actions

Configure automatic actions when SLA is at risk:

-- Create SLA action policy
CREATE SLA POLICY web_api_enforcement
FOR SLA web_api_sla
WITH (
-- When approaching SLA violation
warning_threshold = 80, -- 80% of limit
warning_action = 'alert',
-- When SLA is violated
violation_action = 'prioritize', -- Boost priority
-- Escalation
escalate_after = '5 minutes',
escalation_action = 'kill_low_priority'
);

Graceful Degradation

Degradation Modes

HeliosDB automatically adjusts behavior under load:

ModeCPU ThresholdMemory ThresholdBehavior
Normal<80%<85%Full capacity
Cautious80-90%85-92%80% parallelism, extended timeouts
Restricted90-95%92-97%50% parallelism, low priority rejected
Survival>95%>97%20% parallelism, critical only

Configuration

-- Configure degradation thresholds
SET degradation.cpu_cautious = 0.80;
SET degradation.cpu_restricted = 0.90;
SET degradation.cpu_survival = 0.95;
SET degradation.memory_cautious = 0.85;
SET degradation.memory_restricted = 0.92;
SET degradation.memory_survival = 0.97;
-- Configure hysteresis (prevents mode thrashing)
SET degradation.hysteresis_factor = 0.85;
SET degradation.min_mode_duration = '30 seconds';

Monitoring Degradation

-- Current degradation status
SELECT * FROM degradation_status;
-- Result:
-- mode: Normal
-- cpu_utilization: 0.65
-- memory_utilization: 0.72
-- parallelism_factor: 1.0
-- timeout_multiplier: 1.0
-- min_priority_accepted: Background
-- time_in_mode: 00:15:32
-- View degradation history
SELECT
mode,
started_at,
ended_at,
duration,
trigger_metric,
queries_rejected
FROM degradation_history
ORDER BY started_at DESC
LIMIT 10;

Mode Transition Alerts

-- Create alert for degradation mode changes
CREATE ALERT degradation_warning
ON degradation_mode_change
WHEN new_mode IN ('Restricted', 'Survival')
ACTION notify_channel('ops-alerts');

Monitoring and Metrics

Real-Time Statistics

-- Current workload overview
SELECT * FROM workload_summary;
-- Result:
-- active_queries: 45
-- queued_queries: 12
-- blocked_queries: 3
-- avg_response_time_ms: 125
-- queries_per_second: 847
-- cpu_utilization: 0.68
-- memory_utilization: 0.72
-- degradation_mode: Normal

Query Statistics

-- Recent query statistics by profile
SELECT
workload_profile,
count(*) as query_count,
avg(execution_time_ms) as avg_time_ms,
percentile_cont(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_ms,
sum(rows_scanned) as total_rows_scanned,
sum(memory_mb) as total_memory_mb
FROM query_stats
WHERE start_time > NOW() - INTERVAL '1 hour'
GROUP BY workload_profile
ORDER BY query_count DESC;

Resource Usage by Profile

-- Resource consumption by workload profile
SELECT
profile_name,
cpu_time_total_sec,
memory_peak_mb,
io_bytes_read,
io_bytes_written,
queries_completed,
queries_timeout,
queries_cancelled
FROM workload_profile_stats
WHERE time_period = 'last_hour';

Queue Statistics

-- Queue performance
SELECT
queue_name,
current_depth,
avg_wait_time_ms,
max_wait_time_ms,
queries_admitted,
queries_rejected,
queries_timeout
FROM queue_stats;

System Views

ViewDescription
workload_profilesAll workload profiles and settings
workload_summaryCurrent system overview
query_statsPer-query execution statistics
queue_statsQuery queue statistics
sla_statusCurrent SLA compliance
degradation_statusCurrent degradation mode
resource_pool_statsResource pool utilization

Multi-Tenant Configuration

Per-Tenant Profiles

-- Create tenant-specific profiles
CREATE WORKLOAD PROFILE tenant_acme
WITH (
max_memory = '8GB',
max_cpu_percent = 25,
max_concurrent_queries = 50,
priority = 70
);
CREATE WORKLOAD PROFILE tenant_globex
WITH (
max_memory = '16GB',
max_cpu_percent = 40,
max_concurrent_queries = 100,
priority = 80
);
-- Associate tenants with profiles
CREATE ROLE tenant_acme_user;
ALTER ROLE tenant_acme_user SET workload_profile = 'tenant_acme';
CREATE ROLE tenant_globex_user;
ALTER ROLE tenant_globex_user SET workload_profile = 'tenant_globex';

Tenant Isolation

-- Create isolated resource pools per tenant
CREATE RESOURCE POOL tenant_acme_pool
WITH (
cpu_shares = 25,
memory_limit = '32GB',
io_priority = 'normal'
);
ALTER WORKLOAD PROFILE tenant_acme
SET resource_pool = 'tenant_acme_pool';

Tenant Usage Monitoring

-- Per-tenant resource usage
SELECT
tenant_id,
total_queries,
total_cpu_time_sec,
total_memory_mb,
avg_response_time_ms,
sla_compliance_pct
FROM tenant_usage_stats
WHERE time_period = 'last_day'
ORDER BY total_queries DESC;

Performance Tuning

Optimizing Queue Settings

-- High-throughput configuration
SET max_concurrent_queries = 500;
SET max_queue_size = 2000;
SET queue_timeout = '10s';
-- Low-latency configuration
SET max_concurrent_queries = 100;
SET max_queue_size = 500;
SET queue_timeout = '5s';

Optimizing Resource Allocation

-- For OLTP workloads
CREATE WORKLOAD PROFILE fast_oltp
WITH (
max_memory = '512MB', -- Small memory for quick queries
max_execution_time = '5s', -- Short timeout
priority = 90, -- High priority
max_concurrent_queries = 200
);
-- For analytics workloads
CREATE WORKLOAD PROFILE heavy_analytics
WITH (
max_memory = '64GB', -- Large memory for complex queries
max_execution_time = '30m', -- Long timeout
priority = 30, -- Lower priority
max_concurrent_queries = 10
);

Tuning Degradation Thresholds

-- Conservative (prioritize stability)
SET degradation.cpu_cautious = 0.70;
SET degradation.cpu_restricted = 0.85;
SET degradation.hysteresis_factor = 0.90;
SET degradation.min_mode_duration = '60 seconds';
-- Aggressive (maximize throughput)
SET degradation.cpu_cautious = 0.85;
SET degradation.cpu_restricted = 0.93;
SET degradation.hysteresis_factor = 0.80;
SET degradation.min_mode_duration = '15 seconds';

Troubleshooting

Common Issues

Queries Stuck in Queue

-- Check queue status
SELECT * FROM workload_queue ORDER BY queue_position;
-- Check for blocking queries
SELECT
blocking.query_id AS blocking_query,
blocked.query_id AS blocked_query,
blocking.execution_time_ms
FROM query_blocks blocking
JOIN query_blocks blocked ON blocking.locks && blocked.waiting_for;
-- Force release a query slot
SELECT release_query_slot('query_id_123');

Resource Limit Exceeded

-- Find resource-heavy queries
SELECT
query_id,
query_text,
memory_mb,
cpu_time_ms,
workload_profile
FROM query_stats
WHERE status = 'cancelled'
AND cancel_reason = 'resource_limit_exceeded'
ORDER BY start_time DESC
LIMIT 10;
-- Increase limits if needed
ALTER WORKLOAD PROFILE my_profile
SET max_memory = '16GB';

SLA Violations

-- Find SLA-violating queries
SELECT
query_id,
query_text,
execution_time_ms,
sla_target_ms,
ROUND((execution_time_ms - sla_target_ms) * 100.0 / sla_target_ms, 1) AS pct_over
FROM sla_violations
WHERE violation_time > NOW() - INTERVAL '1 hour'
ORDER BY pct_over DESC;
-- Check system load during violations
SELECT
time_bucket('5 minutes', time) AS bucket,
avg(cpu_utilization) AS avg_cpu,
avg(memory_utilization) AS avg_memory,
count(DISTINCT query_id) FILTER (WHERE is_sla_violation) AS violations
FROM system_metrics
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket
ORDER BY bucket;

Diagnostic Queries

-- Overall system health check
SELECT * FROM workload_health_check;
-- Find bottlenecks
SELECT
bottleneck_type,
severity,
affected_queries,
recommendation
FROM workload_bottleneck_analysis;
-- Profile configuration audit
SELECT
profile_name,
config_issue,
severity,
recommendation
FROM workload_profile_audit;

Best Practices

1. Start Conservative, Tune Later

-- Start with reasonable limits
CREATE WORKLOAD PROFILE new_app
WITH (
max_memory = '2GB',
max_execution_time = '30s',
priority = 50
);
-- Monitor and adjust based on actual usage
SELECT
avg(memory_peak_mb) AS avg_memory,
percentile_cont(0.99) WITHIN GROUP (ORDER BY memory_peak_mb) AS p99_memory,
avg(execution_time_ms) AS avg_time
FROM query_stats
WHERE workload_profile = 'new_app';
-- Adjust based on data
ALTER WORKLOAD PROFILE new_app
SET max_memory = '4GB'; -- Increase if needed

2. Separate OLTP and OLAP Workloads

-- OLTP profile (fast, limited resources)
CREATE WORKLOAD PROFILE oltp_workload
WITH (
max_memory = '512MB',
max_execution_time = '5s',
max_rows_returned = 10000,
priority = 90,
resource_pool = 'oltp_pool'
);
-- OLAP profile (slow, more resources)
CREATE WORKLOAD PROFILE olap_workload
WITH (
max_memory = '32GB',
max_execution_time = '30m',
priority = 30,
resource_pool = 'olap_pool'
);

3. Define Clear Priority Hierarchy

-- Priority assignment strategy
-- 100: Health checks, monitoring
-- 90: Critical business operations
-- 80: User-facing API queries
-- 50: Standard application queries
-- 30: Background batch jobs
-- 10: Maintenance, analytics
-- Example implementation
CREATE WORKLOAD PROFILE health_check WITH (priority = 100);
CREATE WORKLOAD PROFILE critical_ops WITH (priority = 90);
CREATE WORKLOAD PROFILE api_queries WITH (priority = 80);
CREATE WORKLOAD PROFILE app_default WITH (priority = 50);
CREATE WORKLOAD PROFILE batch_jobs WITH (priority = 30);
CREATE WORKLOAD PROFILE maintenance WITH (priority = 10);

4. Monitor and Alert

-- Create alerts for common issues
CREATE ALERT queue_depth_high
ON queue_depth > 100
ACTION notify_channel('db-ops');
CREATE ALERT sla_violation_rate
ON sla_violation_rate > 0.01
ACTION notify_channel('db-ops');
CREATE ALERT degradation_mode
ON degradation_mode IN ('Restricted', 'Survival')
ACTION notify_channel('db-ops');

5. Regular Review and Optimization

-- Weekly review query
SELECT
workload_profile,
query_count,
avg_execution_time_ms,
p99_execution_time_ms,
memory_peak_avg_mb,
sla_compliance_pct,
CASE
WHEN avg_execution_time_ms > max_execution_time_ms * 0.8 THEN 'REVIEW'
WHEN memory_peak_avg_mb > max_memory_mb * 0.9 THEN 'REVIEW'
ELSE 'OK'
END AS action_needed
FROM workload_profile_weekly_summary
ORDER BY query_count DESC;

API Reference

SQL Commands

Workload Profiles

-- Create
CREATE WORKLOAD PROFILE name WITH (options);
-- Modify
ALTER WORKLOAD PROFILE name SET option = value;
-- Delete
DROP WORKLOAD PROFILE name [CASCADE];
-- Assign to session
SET workload_profile = 'name';

Resource Pools

-- Create
CREATE RESOURCE POOL name WITH (options);
-- Modify
ALTER RESOURCE POOL name SET option = value;
-- Delete
DROP RESOURCE POOL name [CASCADE];

SLAs

-- Create
CREATE SLA name WITH (options);
-- Modify
ALTER SLA name SET option = value;
-- Delete
DROP SLA name;

Session Variables

VariableDescriptionDefault
workload_profileActive workload profile’default’
query_priorityQuery priority level’normal’
admission_control_enabledEnable admission controltrue
max_concurrent_queriesSystem-wide query limit1000

System Functions

FunctionDescription
promote_query_priority(query_id, priority)Boost query priority
cancel_queued_query(query_id)Cancel a queued query
release_query_slot(query_id)Force release query resources
get_workload_status()Get current workload summary
get_degradation_mode()Get current degradation mode

Support: For issues or questions, contact workload-support@heliosdb.com

Related Documentation:

License: Enterprise license required for advanced workload management features.

Version: HeliosDB v7.0+ with Workload Management extension