Skip to content

3-Tier Storage (Hot/Warm/Cold) User Guide

3-Tier Storage (Hot/Warm/Cold) User Guide

Overview

HeliosDB’s 3-tier storage automatically moves data between NVMe SSD (hot), SATA SSD (warm), and S3 (cold) tiers based on access patterns, achieving up to 80% cost reduction while maintaining query performance.

Benefits

  • 80%+ cost reduction for large databases (100TB+)
  • Automatic data tiering based on age, access frequency, and size
  • Sub-millisecond hot tier performance
  • Transparent to applications (no code changes)
  • Configurable policies per table or partition

Storage Tiers

TierMediumLatencyCost/GBUse Case
HotNVMe SSD<1ms$0.15Active data (last 7 days)
WarmSATA SSD1-5ms$0.04Recent data (8-30 days)
ColdS310-50ms$0.02Archive (>30 days)

Prerequisites

System Requirements

  • HeliosDB v3.2 or later
  • NVMe storage for hot tier (recommended 1TB+)
  • SATA storage for warm tier (recommended 5TB+)
  • S3-compatible object storage (AWS S3, MinIO, Ceph)

Storage Planning

Calculate storage needs:

Hot Tier: Active data (7 days) + 20% buffer
Warm Tier: Recent data (30 days) + 10% buffer
Cold Tier: Everything else (unlimited)

Step-by-Step Configuration

1. Configure Storage Tiers

Edit /etc/heliosdb/heliosdb.conf:

tiered_storage:
enabled: true
# Hot Tier (NVMe)
hot_tier:
path: /mnt/nvme/heliosdb
max_size_gb: 1000
latency_target_ms: 1
cost_per_gb: 0.15
# Warm Tier (SATA)
warm_tier:
path: /mnt/ssd/heliosdb
max_size_gb: 5000
latency_target_ms: 5
cost_per_gb: 0.04
# Cold Tier (S3)
cold_tier:
type: s3
bucket: heliosdb-cold-tier
region: us-east-1
endpoint: https://s3.amazonaws.com
access_key_id: ${AWS_ACCESS_KEY_ID} # From environment
secret_access_key: ${AWS_SECRET_ACCESS_KEY}
latency_target_ms: 50
cost_per_gb: 0.02
# Tiering Policies
policies:
hot_to_warm: 7d # Move after 7 days
warm_to_cold: 30d # Move after 30 days
auto_tier: true
check_interval: 3600s # Hourly evaluation
# Migration Settings
migration:
max_concurrent: 5
max_bandwidth_mbps: 100
atomic_transitions: true
enable_throttling: true

2. Set Up S3 Storage

AWS S3

Terminal window
# Create S3 bucket
aws s3 mb s3://heliosdb-cold-tier --region us-east-1
# Enable versioning (optional)
aws s3api put-bucket-versioning \
--bucket heliosdb-cold-tier \
--versioning-configuration Status=Enabled
# Configure lifecycle (optional archival to Glacier)
aws s3api put-bucket-lifecycle-configuration \
--bucket heliosdb-cold-tier \
--lifecycle-configuration file://lifecycle.json

lifecycle.json:

{
"Rules": [{
"Id": "Archive old data",
"Status": "Enabled",
"Transitions": [{
"Days": 90,
"StorageClass": "GLACIER"
}]
}]
}

MinIO (Self-Hosted)

# heliosdb.conf
tiered_storage:
cold_tier:
type: s3
bucket: heliosdb-cold
endpoint: http://minio:9000
access_key_id: minioadmin
secret_access_key: minioadmin
path_style: true # MinIO requires path-style

3. Initialize Tiering

-- Enable tiering for database
ALTER DATABASE mydb SET tiering = 'enabled';
-- Check tier status
SELECT * FROM heliosdb.tier_status();

Expected output:

tier_name | path | max_size_gb | used_gb | available_gb | status
-----------+-------------------+-------------+---------+--------------+--------
hot | /mnt/nvme/... | 1000 | 124 | 876 | online
warm | /mnt/ssd/... | 5000 | 892 | 4108 | online
cold | s3://helios-cold | unlimited | 45000 | unlimited | online

SQL Examples

Table-Level Tiering Policies

-- Create table with tiering policy
CREATE TABLE events (
id SERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ,
data JSONB
) WITH (
tiering_policy = 'time_based',
hot_to_warm = '7 days',
warm_to_cold = '30 days'
);

Partition-Level Policies

-- Time-series table with automatic partitioning and tiering
CREATE TABLE metrics (
timestamp TIMESTAMPTZ,
sensor_id INT,
value DOUBLE PRECISION
) PARTITION BY RANGE (timestamp);
-- Create monthly partitions
CREATE TABLE metrics_2025_10 PARTITION OF metrics
FOR VALUES FROM ('2025-10-01') TO ('2025-11-01')
WITH (
tiering_policy = 'time_based',
hot_to_warm = '7 days',
warm_to_cold = '30 days'
);

Manual Tier Assignment

-- Force table to specific tier
ALTER TABLE historical_data SET tier = 'cold';
-- Check current tier
SELECT
tablename,
current_tier,
size_mb,
last_access
FROM heliosdb.table_tier_info
WHERE tablename = 'historical_data';

Viewing Tiering Status

-- Cost analysis report
SELECT * FROM heliosdb.cost_analysis_report();

Output:

{
"total_size_gb": 105000,
"total_cost_monthly_usd": 2230,
"breakdown": {
"hot": {"size_gb": 1000, "cost_usd": 150},
"warm": {"size_gb": 5000, "cost_usd": 200},
"cold": {"size_gb": 99000, "cost_usd": 1880}
},
"savings_vs_all_hot": {
"baseline_cost_usd": 15000,
"actual_cost_usd": 2230,
"savings_usd": 12770,
"savings_percent": 85.1
}
}

Migration Monitoring

-- View active migrations
SELECT
object_key,
from_tier,
to_tier,
progress_percent,
estimated_completion
FROM heliosdb.active_migrations;
-- Migration history
SELECT
timestamp,
object_key,
from_tier,
to_tier,
duration_ms,
success
FROM heliosdb.migration_history
WHERE timestamp > now() - interval '24 hours'
ORDER BY timestamp DESC
LIMIT 20;

Common Use Cases

Use Case 1: Time-Series Data

Scenario: IoT sensor data, keep recent data fast, archive old data

CREATE TABLE sensor_readings (
timestamp TIMESTAMPTZ,
sensor_id INT,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
) PARTITION BY RANGE (timestamp)
WITH (
tiering_policy = 'time_based',
hot_to_warm = '7 days',
warm_to_cold = '30 days'
);
-- Query automatically uses appropriate tier
SELECT AVG(temperature)
FROM sensor_readings
WHERE sensor_id = 123
AND timestamp > now() - interval '1 day'; -- Served from hot tier
SELECT AVG(temperature)
FROM sensor_readings
WHERE sensor_id = 123
AND timestamp BETWEEN '2025-01-01' AND '2025-01-31'; -- Served from cold tier

Use Case 2: Log Analytics

Scenario: Application logs with retention requirements

CREATE TABLE application_logs (
timestamp TIMESTAMPTZ,
level VARCHAR(10),
message TEXT,
metadata JSONB
) WITH (
tiering_policy = 'time_based',
hot_to_warm = '3 days', -- Recent logs hot
warm_to_cold = '14 days', -- Keep 2 weeks warm
cold_retention = '365 days' -- Legal requirement
);
-- Automatic cold tier pruning after 1 year

Use Case 3: Archive with Infrequent Access

Scenario: Customer data archives

CREATE TABLE customer_archives (
customer_id INT PRIMARY KEY,
archived_date DATE,
data JSONB
) WITH (
tiering_policy = 'access_based',
hot_to_warm = '1 access per 7 days',
warm_to_cold = '1 access per 30 days'
);
-- Frequently accessed customers stay hot
-- Inactive customers move to cold automatically

Troubleshooting

Issue: Slow Cold Tier Queries

Symptom: Queries hitting cold tier are slow (>1 second)

Solution 1: Cache Frequently Accessed Data

-- Pin important historical data to warm tier
ALTER TABLE historical_data
SET tier = 'warm'
WHERE access_count > 10;

Solution 2: Materialized Views

-- Create aggregated view in hot tier
CREATE MATERIALIZED VIEW monthly_summaries AS
SELECT
DATE_TRUNC('month', timestamp) as month,
AVG(value) as avg_value,
COUNT(*) as count
FROM cold_tier_table
GROUP BY month
WITH (tier = 'hot', refresh_interval = '1 day');

Issue: Migration Bandwidth Saturation

Symptom: Network saturated during data migration

Solution: Adjust Throttling

# heliosdb.conf
tiered_storage:
migration:
max_bandwidth_mbps: 50 # Reduce from 100
max_concurrent: 3 # Reduce from 5

Issue: Hot Tier Full

Symptom:

ERROR: Hot tier capacity exceeded (1050 GB / 1000 GB)

Solution 1: Force Immediate Migration

-- Manually trigger migration to free space
SELECT heliosdb.trigger_tier_migration('hot', 'warm', limit_gb => 200);

Solution 2: Adjust Policy

-- Reduce hot tier retention
ALTER TABLE my_table SET hot_to_warm = '5 days'; -- Was 7 days

Solution 3: Increase Capacity

# heliosdb.conf
tiered_storage:
hot_tier:
max_size_gb: 1500 # Increase from 1000

Issue: S3 Access Errors

Symptom:

ERROR: Failed to access cold tier: Access Denied

Diagnosis:

Terminal window
# Test S3 access
aws s3 ls s3://heliosdb-cold-tier --region us-east-1
# Check IAM permissions
aws iam get-user-policy --user-name heliosdb --policy-name s3-access

Solution: Update IAM Policy

{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::heliosdb-cold-tier",
"arn:aws:s3:::heliosdb-cold-tier/*"
]
}]
}

Performance Tuning

1. Optimize Tier Thresholds

-- Analyze access patterns
SELECT
tier_name,
AVG(access_frequency) as avg_access_freq,
AVG(query_latency_ms) as avg_latency
FROM heliosdb.tier_performance_metrics
WHERE timestamp > now() - interval '7 days'
GROUP BY tier_name;
-- Adjust based on findings
-- If warm tier has high access frequency, keep data there longer
ALTER DATABASE mydb SET warm_to_cold = '45 days'; -- Increase from 30

2. Prefetch Optimization

# heliosdb.conf
tiered_storage:
cold_tier:
prefetch_enabled: true
prefetch_size_mb: 10
# Prefetch adjacent blocks when accessing cold tier

3. Compression Settings

tiered_storage:
cold_tier:
compression: zstd # Better compression, slightly slower
compression_level: 3 # Balance compression vs speed

4. Multipart Upload Tuning

tiered_storage:
cold_tier:
multipart_threshold_mb: 100
multipart_chunk_size_mb: 10
# Use multipart for large objects

Best Practices

1. Design for Tiering

-- Include timestamp for time-based policies
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL, -- Critical for tiering
...
) WITH (tiering_policy = 'time_based');
-- Partition large tables
CREATE TABLE events (
event_time TIMESTAMPTZ,
...
) PARTITION BY RANGE (event_time);

2. Monitor Cost

-- Create monthly cost tracking
CREATE TABLE tier_cost_history AS
SELECT
DATE_TRUNC('month', timestamp) as month,
tier_name,
AVG(size_gb) as avg_size_gb,
AVG(cost_usd) as avg_cost_usd
FROM heliosdb.tier_metrics
GROUP BY month, tier_name;
-- Set up cost alerts
CREATE FUNCTION check_tier_costs() RETURNS TRIGGER AS $$
BEGIN
IF NEW.total_cost_usd > 3000 THEN
RAISE WARNING 'Monthly tier costs exceed $3000: $%', NEW.total_cost_usd;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

3. Test Tiering Policies

-- Create test environment
CREATE SCHEMA tier_test;
-- Test table with aggressive tiering
CREATE TABLE tier_test.sample (
timestamp TIMESTAMPTZ,
data TEXT
) WITH (
tiering_policy = 'time_based',
hot_to_warm = '1 hour', -- Fast testing
warm_to_cold = '3 hours'
);
-- Insert test data
INSERT INTO tier_test.sample
SELECT now() - interval '1 hour' * i, 'test data'
FROM generate_series(1, 100) i;
-- Wait and verify tier transitions
SELECT
current_tier,
COUNT(*)
FROM heliosdb.object_tier_info
WHERE table_name = 'tier_test.sample'
GROUP BY current_tier;

4. Handle Cold Tier Latency

# Application code with caching
from functools import lru_cache
import psycopg2
@lru_cache(maxsize=1000)
def fetch_historical_data(customer_id, date):
"""Cache cold tier queries"""
conn = psycopg2.connect(...)
cursor = conn.cursor()
cursor.execute("""
SELECT * FROM customer_archives
WHERE customer_id = %s AND archived_date = %s
""", (customer_id, date))
return cursor.fetchall()

5. Graceful Degradation

-- Create fallback views for cold tier unavailability
CREATE VIEW recent_data_view AS
SELECT * FROM all_data WHERE timestamp > now() - interval '30 days'
UNION ALL
SELECT * FROM cached_cold_data; -- Stale cache for emergencies

Advanced Topics

Custom Tiering Policies

-- Create custom policy based on access frequency
CREATE TIERING POLICY frequent_access AS
WHEN access_count > 100 THEN 'hot'
WHEN access_count > 10 THEN 'warm'
ELSE 'cold';
-- Apply to table
ALTER TABLE my_table SET tiering_policy = frequent_access;

Lifecycle Management

-- Automatic deletion from cold tier after retention period
CREATE TABLE archived_logs (
...
) WITH (
tiering_policy = 'time_based',
cold_retention = '365 days',
auto_delete_after_retention = true
);

Cross-Region Cold Tier

tiered_storage:
cold_tier:
type: s3
# Multi-region replication
regions:
- name: us-east-1
bucket: heliosdb-cold-us-east-1
primary: true
- name: eu-west-1
bucket: heliosdb-cold-eu-west-1
replica: true

Monitoring

Key Metrics

-- Tier performance dashboard
SELECT
tier_name,
AVG(latency_p50_ms) as p50_latency,
AVG(latency_p95_ms) as p95_latency,
AVG(latency_p99_ms) as p99_latency,
AVG(throughput_mbps) as avg_throughput,
AVG(iops) as avg_iops
FROM heliosdb.tier_performance
WHERE timestamp > now() - interval '1 hour'
GROUP BY tier_name;

Prometheus Metrics

prometheus.yml
scrape_configs:
- job_name: 'heliosdb_tiering'
static_configs:
- targets: ['localhost:9090']
metrics_path: '/metrics/tiering'

Exported metrics:

  • heliosdb_tier_size_bytes{tier="hot|warm|cold"}
  • heliosdb_tier_used_bytes{tier="hot|warm|cold"}
  • heliosdb_tier_latency_seconds{tier="hot|warm|cold",percentile="50|95|99"}
  • heliosdb_tier_migrations_total{from="X",to="Y"}
  • heliosdb_tier_cost_usd{tier="hot|warm|cold"}

Conclusion

3-tier storage in HeliosDB provides automatic, transparent data tiering with significant cost savings. By following this guide, you can achieve 80%+ cost reduction while maintaining excellent query performance.

Key Takeaways:

  • Hot tier (<1ms) for active data
  • Warm tier (1-5ms) for recent data
  • Cold tier (10-50ms) for archives
  • Automatic tiering based on configurable policies
  • 80%+ cost savings for large databases

For more information:

  • API Reference: /docs/api/tiering.md
  • Architecture: /docs/architecture/tiered-storage.md
  • S3 Integration: /docs/integrations/s3-configuration.md