ALP Compression Quick Start Guide
ALP Compression Quick Start Guide
What is ALP?
ALP (Adaptive Lossless Floating-Point) is a state-of-the-art compression algorithm for numeric data that achieves 2-10x compression with zero precision loss.
Quick Start
1. It Just Works™
ALP compression is automatically enabled for Float4 and Float8 columns:
use heliosdb_nano::{Config, StorageEngine, Schema, Column, DataType, Tuple, Value};
let engine = StorageEngine::open("./data", &Config::default())?;
// Create table with numeric columnslet schema = Schema::new(vec![ Column::new("id", DataType::Int4), Column::new("price", DataType::Float8), // Auto-compressed Column::new("discount", DataType::Float4), // Auto-compressed]);
engine.catalog().create_table("products", schema)?;
// Insert data - compression happens automaticallyengine.insert_tuple("products", Tuple { values: vec![ Value::Int4(1), Value::Float8(99.95), Value::Float4(0.15), ],})?;
// Query - decompression is transparentlet results = engine.scan_table("products")?;2. Check Compression Statistics
// Get compression stats for a tableif let Some(stats) = engine.get_compression_stats("products")? { println!("Compression ratio: {:.2}x", stats.overall_ratio); println!("Original size: {} bytes", stats.total_original_size); println!("Compressed size: {} bytes", stats.total_compressed_size); println!("Space saved: {} bytes", stats.total_original_size - stats.total_compressed_size);
// Per-column statistics for (column, meta) in &stats.column_stats { println!("{}: {:?} codec, {:.2}x ratio", column, meta.codec, meta.compression_ratio); }}3. Custom Configuration (Optional)
use heliosdb_nano::storage::compression::CompressionConfig;
let mut config = CompressionConfig::default();
// Adjust compression aggressivenessconfig.min_rows_for_compression = 500; // Compress smaller batchesconfig.compression_level = 8; // Higher compression
// Disable compression for specific columnsconfig.column_overrides.insert("user_id".to_string(), false);
// Create engine with custom config// Note: Currently must be set at creation timelet compression_manager = CompressionManager::new(config);4. Batch Compression (Advanced)
For large datasets, use batch compression directly:
let manager = engine.compression_manager();
// Compress large f64 batchlet values: Vec<f64> = (0..100000).map(|i| i as f64 * 0.01).collect();let (compressed, metadata) = manager.compress_f64_batch(&values)?;
println!("Compressed {} values:", values.len());println!(" Original: {} bytes", metadata.original_size);println!(" Compressed: {} bytes", metadata.compressed_size);println!(" Ratio: {:.2}x", metadata.compression_ratio);println!(" Codec: {:?}", metadata.codec);
// Decompresslet decompressed = manager.decompress_f64_batch(&compressed, metadata.codec)?;assert_eq!(values, decompressed); // Lossless!Configuration Options
pub struct CompressionConfig { pub enabled: bool, // Default: true pub alp_enabled: bool, // Default: true pub min_rows_for_compression: usize, // Default: 1000 pub compression_level: u8, // Default: 6 (1-9) pub min_data_size: usize, // Default: 1024 bytes pub min_compression_ratio: f64, // Default: 1.2 (20% savings) pub column_overrides: HashMap<String, bool>, pub adaptive_compression: bool, // Default: true}Presets
Default (Balanced):
CompressionConfig::default()High Compression:
CompressionConfig { min_rows_for_compression: 100, compression_level: 9, min_compression_ratio: 1.1, ..Default::default()}Performance:
CompressionConfig { min_rows_for_compression: 5000, compression_level: 3, min_compression_ratio: 2.0, ..Default::default()}Performance Characteristics
Compression Ratios by Data Type
| Data Pattern | Ratio | Example |
|---|---|---|
| Prices | 4-5x | $99.95, $1234.56 |
| Percentages | 3-4x | 15.5%, 99.9% |
| Temperatures | 3-5x | 20.5°C, -10.2°C |
| Measurements | 3-5x | 65.3kg, 180.5cm |
| Scientific | 1.2-1.5x | π, e, √2 |
Throughput
- Encoding: ~0.5 doubles per CPU cycle
- Decoding: ~2.6 doubles per CPU cycle
- 20-50% faster than alternatives (Gorilla, Chimp)
Space Savings
Example: 1M rows × 5 Float64 columns
- Uncompressed: 40 MB
- Compressed: 10-13 MB
- Savings: 27-30 MB (67-75%)
Best Practices
1. When to Use ALP
✅ Good for:
- Financial data (prices, amounts, percentages)
- Sensor data (temperatures, humidity, pressure)
- Metrics and measurements
- Time-series numeric data
- Scientific datasets
❌ Not ideal for:
- Random floating-point data
- Very small tables (< 1000 rows)
- Frequently updated hot columns
- Columns requiring random single-value access
2. Configuration Tips
Storage-Optimized (data warehouse):
config.min_rows_for_compression = 100;config.compression_level = 9;config.min_compression_ratio = 1.1;Throughput-Optimized (OLTP):
config.min_rows_for_compression = 10000;config.compression_level = 3;config.min_compression_ratio = 2.0;Balanced (default):
config = CompressionConfig::default();3. Column Selection
Disable compression for:
- Primary keys (frequently accessed)
- Foreign keys (join columns)
- Very small columns
- Already compressed data
config.column_overrides.insert("id".to_string(), false);config.column_overrides.insert("user_id".to_string(), false);4. Monitoring
Check compression effectiveness:
let all_stats = engine.get_all_compression_stats()?;
for (table, stats) in all_stats { if stats.overall_ratio < 1.5 { println!("Warning: Poor compression on table {}: {:.2}x", table, stats.overall_ratio); }}Troubleshooting
Low Compression Ratio
Symptom: Compression ratio < 1.5x
Causes:
- Random or encrypted data
- Already compressed values
- Very diverse value ranges
Solutions:
- Check data patterns with
analyze_data() - Adjust
min_compression_ratiothreshold - Disable compression for problematic columns
Performance Issues
Symptom: Slow inserts/queries
Causes:
- Compression level too high
- Batch size too small
- Compressing hot columns
Solutions:
- Lower
compression_level(try 3-5) - Increase
min_rows_for_compression - Disable compression on hot columns
Statistics Not Showing
Symptom: get_compression_stats() returns None
Causes:
- Table has no compressed columns
- No data inserted yet
- Statistics tracking disabled
Solutions:
- Insert data first
- Check column data types (Float4/Float8)
- Verify compression is enabled
Examples
Example 1: Financial Application
let schema = Schema::new(vec![ Column::new("transaction_id", DataType::Int8), Column::new("amount", DataType::Float8), // Prices compress 4-5x Column::new("tax_rate", DataType::Float4), // Percentages compress 3-4x Column::new("balance", DataType::Float8), // Balances compress 4-5x]);
engine.catalog().create_table("transactions", schema)?;
// After inserting 1M transactionslet stats = engine.get_compression_stats("transactions")?.unwrap();// Expected: 3-5x overall compression ratio// Space saved: ~20-30 MB per million rowsExample 2: IoT Sensor Data
let schema = Schema::new(vec![ Column::new("sensor_id", DataType::Int4), Column::new("timestamp", DataType::Timestamp), Column::new("temperature", DataType::Float4), // Measurements compress 3-5x Column::new("humidity", DataType::Float4), // Measurements compress 3-5x Column::new("pressure", DataType::Float4), // Measurements compress 3-5x]);
engine.catalog().create_table("sensor_readings", schema)?;
// After inserting 10M readingslet stats = engine.get_compression_stats("sensor_readings")?.unwrap();// Expected: 3-4x overall compression ratio// Space saved: ~100-150 MB per 10M readingsExample 3: Scientific Data
let schema = Schema::new(vec![ Column::new("experiment_id", DataType::Int4), Column::new("measurement", DataType::Float8), // High precision, 1.2-1.5x Column::new("error_margin", DataType::Float8), // High precision, 1.2-1.5x]);
engine.catalog().create_table("experiments", schema)?;
// Scientific data has lower but still valuable compressionlet stats = engine.get_compression_stats("experiments")?.unwrap();// Expected: 1.2-1.5x overall compression ratio// Still 20-30% space savings with zero precision lossTesting
Run tests to verify compression:
# All compression testscargo test compressioncargo test alp
# Integration tests onlycargo test --test alp_integration_tests
# With outputcargo test compression -- --nocaptureFAQ
Q: Is ALP lossless? A: Yes, 100% lossless. Every bit is preserved.
Q: What’s the performance impact? A: Minimal. Decoding is ~5x faster than encoding, and both are very fast.
Q: Can I disable ALP?
A: Yes, set config.alp_enabled = false or disable per-column.
Q: Does it work with existing data? A: Yes, reads handle both compressed and uncompressed data transparently.
Q: What about NaN and Infinity? A: Fully supported, including signed zeros and denormal numbers.
Q: Can I change config at runtime? A: Not yet. Set config at engine creation time. (TODO for V2.3)
Next Steps
- Read full documentation:
docs/implementation/ALP_INTEGRATION_COMPLETE.md - Review benchmarks:
benches/alp_compression_benchmark.rs - Run integration tests:
cargo test alp_integration_tests - Monitor your compression ratios:
engine.get_all_compression_stats()
Support
For issues or questions:
- Check
docs/implementation/ALP_INTEGRATION_COMPLETE.md - Review test cases in
tests/alp_integration_tests.rs - File issue on GitHub (include compression stats output)