Skip to content

ALP Compression Quick Start Guide

ALP Compression Quick Start Guide

What is ALP?

ALP (Adaptive Lossless Floating-Point) is a state-of-the-art compression algorithm for numeric data that achieves 2-10x compression with zero precision loss.

Quick Start

1. It Just Works™

ALP compression is automatically enabled for Float4 and Float8 columns:

use heliosdb_nano::{Config, StorageEngine, Schema, Column, DataType, Tuple, Value};
let engine = StorageEngine::open("./data", &Config::default())?;
// Create table with numeric columns
let schema = Schema::new(vec![
Column::new("id", DataType::Int4),
Column::new("price", DataType::Float8), // Auto-compressed
Column::new("discount", DataType::Float4), // Auto-compressed
]);
engine.catalog().create_table("products", schema)?;
// Insert data - compression happens automatically
engine.insert_tuple("products", Tuple {
values: vec![
Value::Int4(1),
Value::Float8(99.95),
Value::Float4(0.15),
],
})?;
// Query - decompression is transparent
let results = engine.scan_table("products")?;

2. Check Compression Statistics

// Get compression stats for a table
if let Some(stats) = engine.get_compression_stats("products")? {
println!("Compression ratio: {:.2}x", stats.overall_ratio);
println!("Original size: {} bytes", stats.total_original_size);
println!("Compressed size: {} bytes", stats.total_compressed_size);
println!("Space saved: {} bytes",
stats.total_original_size - stats.total_compressed_size);
// Per-column statistics
for (column, meta) in &stats.column_stats {
println!("{}: {:?} codec, {:.2}x ratio",
column, meta.codec, meta.compression_ratio);
}
}

3. Custom Configuration (Optional)

use heliosdb_nano::storage::compression::CompressionConfig;
let mut config = CompressionConfig::default();
// Adjust compression aggressiveness
config.min_rows_for_compression = 500; // Compress smaller batches
config.compression_level = 8; // Higher compression
// Disable compression for specific columns
config.column_overrides.insert("user_id".to_string(), false);
// Create engine with custom config
// Note: Currently must be set at creation time
let compression_manager = CompressionManager::new(config);

4. Batch Compression (Advanced)

For large datasets, use batch compression directly:

let manager = engine.compression_manager();
// Compress large f64 batch
let values: Vec<f64> = (0..100000).map(|i| i as f64 * 0.01).collect();
let (compressed, metadata) = manager.compress_f64_batch(&values)?;
println!("Compressed {} values:", values.len());
println!(" Original: {} bytes", metadata.original_size);
println!(" Compressed: {} bytes", metadata.compressed_size);
println!(" Ratio: {:.2}x", metadata.compression_ratio);
println!(" Codec: {:?}", metadata.codec);
// Decompress
let decompressed = manager.decompress_f64_batch(&compressed, metadata.codec)?;
assert_eq!(values, decompressed); // Lossless!

Configuration Options

pub struct CompressionConfig {
pub enabled: bool, // Default: true
pub alp_enabled: bool, // Default: true
pub min_rows_for_compression: usize, // Default: 1000
pub compression_level: u8, // Default: 6 (1-9)
pub min_data_size: usize, // Default: 1024 bytes
pub min_compression_ratio: f64, // Default: 1.2 (20% savings)
pub column_overrides: HashMap<String, bool>,
pub adaptive_compression: bool, // Default: true
}

Presets

Default (Balanced):

CompressionConfig::default()

High Compression:

CompressionConfig {
min_rows_for_compression: 100,
compression_level: 9,
min_compression_ratio: 1.1,
..Default::default()
}

Performance:

CompressionConfig {
min_rows_for_compression: 5000,
compression_level: 3,
min_compression_ratio: 2.0,
..Default::default()
}

Performance Characteristics

Compression Ratios by Data Type

Data PatternRatioExample
Prices4-5x$99.95, $1234.56
Percentages3-4x15.5%, 99.9%
Temperatures3-5x20.5°C, -10.2°C
Measurements3-5x65.3kg, 180.5cm
Scientific1.2-1.5xπ, e, √2

Throughput

  • Encoding: ~0.5 doubles per CPU cycle
  • Decoding: ~2.6 doubles per CPU cycle
  • 20-50% faster than alternatives (Gorilla, Chimp)

Space Savings

Example: 1M rows × 5 Float64 columns

  • Uncompressed: 40 MB
  • Compressed: 10-13 MB
  • Savings: 27-30 MB (67-75%)

Best Practices

1. When to Use ALP

Good for:

  • Financial data (prices, amounts, percentages)
  • Sensor data (temperatures, humidity, pressure)
  • Metrics and measurements
  • Time-series numeric data
  • Scientific datasets

Not ideal for:

  • Random floating-point data
  • Very small tables (< 1000 rows)
  • Frequently updated hot columns
  • Columns requiring random single-value access

2. Configuration Tips

Storage-Optimized (data warehouse):

config.min_rows_for_compression = 100;
config.compression_level = 9;
config.min_compression_ratio = 1.1;

Throughput-Optimized (OLTP):

config.min_rows_for_compression = 10000;
config.compression_level = 3;
config.min_compression_ratio = 2.0;

Balanced (default):

config = CompressionConfig::default();

3. Column Selection

Disable compression for:

  • Primary keys (frequently accessed)
  • Foreign keys (join columns)
  • Very small columns
  • Already compressed data
config.column_overrides.insert("id".to_string(), false);
config.column_overrides.insert("user_id".to_string(), false);

4. Monitoring

Check compression effectiveness:

let all_stats = engine.get_all_compression_stats()?;
for (table, stats) in all_stats {
if stats.overall_ratio < 1.5 {
println!("Warning: Poor compression on table {}: {:.2}x",
table, stats.overall_ratio);
}
}

Troubleshooting

Low Compression Ratio

Symptom: Compression ratio < 1.5x

Causes:

  • Random or encrypted data
  • Already compressed values
  • Very diverse value ranges

Solutions:

  • Check data patterns with analyze_data()
  • Adjust min_compression_ratio threshold
  • Disable compression for problematic columns

Performance Issues

Symptom: Slow inserts/queries

Causes:

  • Compression level too high
  • Batch size too small
  • Compressing hot columns

Solutions:

  • Lower compression_level (try 3-5)
  • Increase min_rows_for_compression
  • Disable compression on hot columns

Statistics Not Showing

Symptom: get_compression_stats() returns None

Causes:

  • Table has no compressed columns
  • No data inserted yet
  • Statistics tracking disabled

Solutions:

  • Insert data first
  • Check column data types (Float4/Float8)
  • Verify compression is enabled

Examples

Example 1: Financial Application

let schema = Schema::new(vec![
Column::new("transaction_id", DataType::Int8),
Column::new("amount", DataType::Float8), // Prices compress 4-5x
Column::new("tax_rate", DataType::Float4), // Percentages compress 3-4x
Column::new("balance", DataType::Float8), // Balances compress 4-5x
]);
engine.catalog().create_table("transactions", schema)?;
// After inserting 1M transactions
let stats = engine.get_compression_stats("transactions")?.unwrap();
// Expected: 3-5x overall compression ratio
// Space saved: ~20-30 MB per million rows

Example 2: IoT Sensor Data

let schema = Schema::new(vec![
Column::new("sensor_id", DataType::Int4),
Column::new("timestamp", DataType::Timestamp),
Column::new("temperature", DataType::Float4), // Measurements compress 3-5x
Column::new("humidity", DataType::Float4), // Measurements compress 3-5x
Column::new("pressure", DataType::Float4), // Measurements compress 3-5x
]);
engine.catalog().create_table("sensor_readings", schema)?;
// After inserting 10M readings
let stats = engine.get_compression_stats("sensor_readings")?.unwrap();
// Expected: 3-4x overall compression ratio
// Space saved: ~100-150 MB per 10M readings

Example 3: Scientific Data

let schema = Schema::new(vec![
Column::new("experiment_id", DataType::Int4),
Column::new("measurement", DataType::Float8), // High precision, 1.2-1.5x
Column::new("error_margin", DataType::Float8), // High precision, 1.2-1.5x
]);
engine.catalog().create_table("experiments", schema)?;
// Scientific data has lower but still valuable compression
let stats = engine.get_compression_stats("experiments")?.unwrap();
// Expected: 1.2-1.5x overall compression ratio
// Still 20-30% space savings with zero precision loss

Testing

Run tests to verify compression:

Terminal window
# All compression tests
cargo test compression
cargo test alp
# Integration tests only
cargo test --test alp_integration_tests
# With output
cargo test compression -- --nocapture

FAQ

Q: Is ALP lossless? A: Yes, 100% lossless. Every bit is preserved.

Q: What’s the performance impact? A: Minimal. Decoding is ~5x faster than encoding, and both are very fast.

Q: Can I disable ALP? A: Yes, set config.alp_enabled = false or disable per-column.

Q: Does it work with existing data? A: Yes, reads handle both compressed and uncompressed data transparently.

Q: What about NaN and Infinity? A: Fully supported, including signed zeros and denormal numbers.

Q: Can I change config at runtime? A: Not yet. Set config at engine creation time. (TODO for V2.3)

Next Steps

  • Read full documentation: docs/implementation/ALP_INTEGRATION_COMPLETE.md
  • Review benchmarks: benches/alp_compression_benchmark.rs
  • Run integration tests: cargo test alp_integration_tests
  • Monitor your compression ratios: engine.get_all_compression_stats()

Support

For issues or questions:

  • Check docs/implementation/ALP_INTEGRATION_COMPLETE.md
  • Review test cases in tests/alp_integration_tests.rs
  • File issue on GitHub (include compression stats output)