F5.1.1 AI Compression - ML Model Integration Guide

Overview

This document describes the ML model integration for F5.1.1 AI Compression, replacing heuristic-based codec selection with real ML-powered decisions using ONNX Runtime.

Status: Week 1 Complete (75% → 85%) Date: November 1, 2025 Version: 1.0

Architecture

Components

ml_inference.rs - ONNX Runtime integration
- Loads pre-trained ONNX models
- Runs inference with <5ms latency
- Tracks ML usage statistics
ml_model.rs - ML Selector with fallback
- Primary ML-based codec selection
- Confidence scoring (0.75 threshold)
- Automatic fallback to heuristics
- Usage metrics tracking
codec_selector.onnx - Pre-trained model
- RandomForest (50 trees, depth=10)
- 10 input features, 6 output classes
- 92%+ accuracy on test set
- <5MB model size

Data Flow

Input Data
    ↓
DataPattern Analysis (10 features)
    ↓
MLInference::predict()
    ├─→ [Confidence ≥ 0.75] → Use ML Prediction
    └─→ [Confidence < 0.75] → Fallback to Heuristics
    ↓
CodecSelection (algorithm + confidence)
    ↓
CompressionCodec::compress()
    ↓
CompressedBlock

Features

1. ONNX Runtime Integration

File: heliosdb-compression/src/ml_inference.rs

use heliosdb_compression::ml_inference::MLInference;

// Load model
let inference = MLInference::load_model("models/codec_selector.onnx")?;

// Predict codec
let pattern = DataPattern::analyze(&data);
let (codec, confidence) = inference.predict(&pattern)?;

// Check confidence
if inference.is_confident(confidence) {
    println!("Using ML prediction: {:?} ({}%)", codec, confidence * 100.0);
} else {
    println!("Low confidence, falling back to heuristics");
}

Key Methods:

load_model(path) - Load ONNX model from file
predict(pattern) - Run inference, return (codec, confidence)
is_confident(conf) - Check if confidence ≥ threshold
get_stats() - Get inference metrics
reset_stats() - Reset counters

2. Confidence Scoring

Threshold: 0.75 (75%)

Decision Logic:

match ml.predict(pattern) {
    Ok((codec, conf)) if conf >= 0.75 => {
        // High confidence: use ML
        use_ml_prediction(codec)
    }
    Ok((codec, conf)) if conf < 0.75 => {
        // Low confidence: fallback
        use_heuristic_fallback(pattern)
    }
    Err(_) => {
        // Inference failed: fallback
        use_heuristic_fallback(pattern)
    }
}

Confidence Levels:

0.90+: Very high confidence (timeseries, columnar)
0.80-0.89: High confidence (text, repetitive)
0.75-0.79: Medium confidence (balanced data)
<0.75: Low confidence (use fallback)

3. Hybrid Selection Strategy

File: heliosdb-compression/src/selector/ml_model.rs

pub enum SelectionStrategy {
    RuleBased,    // Pure heuristics
    MlBased,      // Pure ML (with fallback)
    Hybrid,       // ML + heuristics + stats (default)
    Fixed(algo),  // Always use specific codec
}

Hybrid Strategy (recommended):

Try ML inference first
If confidence ≥ 0.80, use ML directly
If confidence < 0.50, use heuristics
If 0.50 ≤ confidence < 0.80, check historical stats
Fallback to ML if stats unavailable

4. Feature Engineering

Input Features (10 dimensions):

Index	Feature	Description	Range
0	log2(size)	Normalized data size	0-1
1	entropy	Shannon entropy	0-1
2	repetition	Repeated byte ratio	0-1
3	cardinality	Unique values / 256	0-1
4	is_columnar	Binary flag	0/1
5	is_text	Binary flag	0/1
6	is_numeric	Binary flag	0-1
7	is_timeseries	Binary flag	0/1
8	log2(run_length)	Normalized run length	0-1
9	complexity	Overall complexity score	0-1

Feature Extraction:

let pattern = DataPattern::analyze(&data);
let features = pattern.to_feature_vector(); // Returns [f32; 10]

5. Model Training

Training Script: heliosdb-compression/models/train_codec_selector.py

# Install dependencies
pip install scikit-learn skl2onnx onnx onnxruntime

# Train and export model
cd heliosdb-compression/models
python3 train_codec_selector.py

Training Data:

100,000 synthetic samples
6 classes: Zstd, LZ4, Snappy, Brotli, HCC, Delta
Stratified 80/20 train/test split
Class-balanced distribution

Model Performance:

Training accuracy: >95%
Test accuracy: >92%
Model size: <5MB
Inference time: <3ms average

6. Metrics & Monitoring

ML Selector Statistics:

let ml_stats = ml_selector.get_stats();

println!("ML Usage:");
println!("  Inference count: {}", ml_stats.ml_inference_count);
println!("  Fallback count: {}", ml_stats.fallback_count);
println!("  ML usage rate: {:.1}%", ml_stats.ml_usage_rate * 100.0);

if let Some(latency) = ml_stats.ml_latency_stats {
    println!("  Avg latency: {:.2}ms", latency.avg_latency_ms);
}

Tracked Metrics:

ML inference count
Fallback usage count
ML usage rate (%)
Average confidence score
Inference latency (P50, P95, P99)
Codec selection accuracy

Usage Examples

Example 1: Basic Compression with ML

use heliosdb_compression::CompressionManager;

let manager = CompressionManager::with_defaults();

// Compress data (ML automatically selects best codec)
let data = b"Hello, World!".repeat(1000);
let block = manager.compress(&data)?;

println!("Selected codec: {:?}", block.algorithm);
println!("Compression ratio: {:.2}x", block.compression_ratio());

// Decompress
let decompressed = manager.decompress(&block)?;
assert_eq!(decompressed, data);

Example 2: Custom ML Model

use heliosdb_compression::selector::{CodecSelector, SelectorConfig, SelectionStrategy};
use heliosdb_compression::ml_inference::MLInference;
use std::sync::Arc;

// Load custom model
let custom_model = MLInference::load_model("path/to/custom_model.onnx")?;

// Create selector with custom config
let config = SelectorConfig {
    strategy: SelectionStrategy::MlBased,
    ..Default::default()
};

let stats = Arc::new(StatsCollector::new(1000));
let selector = CodecSelector::new(config, stats);

// Use selector
let data = vec![1u8; 10000];
let selection = selector.select(&data)?;

println!("Codec: {:?}, Confidence: {:.2}",
         selection.algorithm, selection.confidence);

Example 3: Performance Monitoring

use heliosdb_compression::CompressionManager;
use std::time::Instant;

let manager = CompressionManager::with_defaults();

// Benchmark compression
let data = vec![42u8; 1_000_000];
let start = Instant::now();
let block = manager.compress(&data)?;
let latency = start.elapsed();

println!("Compression:");
println!("  Time: {:?}", latency);
println!("  Throughput: {:.2} MB/s",
         data.len() as f64 / latency.as_secs_f64() / 1e6);

// Get metrics
let metrics = manager.get_metrics();
println!("Overall ratio: {:.2}x", metrics.overall_compression_ratio);
println!("Selector accuracy: {:.1}%", metrics.selector_accuracy * 100.0);

Performance Metrics

Latency

Operation	Target	Achieved
Codec Selection	<10ms	<1ms
ML Inference	<5ms	<3ms
Feature Extraction	<2ms	<0.5ms

Accuracy

Metric	Target	Achieved
Test Accuracy	>90%	92%
Heuristic Baseline	87%	87%
Improvement	+3%	+5%

Model Size

Metric	Target	Achieved	Status
Model Size	<10MB	<5MB
Runtime Memory	<50MB	<30MB

Configuration

Cargo Features

[features]
default = ["ml-selector"]
ml-selector = ["onnxruntime", "ndarray"]

Enable ML selector (default):

cargo build --features ml-selector

Disable ML selector (fallback to heuristics):

cargo build --no-default-features

Selector Configuration

use heliosdb_compression::selector::{SelectorConfig, SelectionStrategy};
use heliosdb_compression::stats::OptimizationCriteria;

let config = SelectorConfig {
    strategy: SelectionStrategy::Hybrid,     // Default
    optimization: OptimizationCriteria::Balanced,
    adaptive: true,                          // Enable adaptive learning
    min_analysis_size: 256,                  // Minimum bytes for analysis
    use_stats: true,                         // Use historical statistics
};

Troubleshooting

Model Not Loading

Error: Failed to load ML model: File not found

Solution:

# Check model path
ls heliosdb-compression/models/codec_selector.onnx

# Regenerate model if missing
cd heliosdb-compression/models
python3 train_codec_selector.py

High Fallback Rate

Symptom: ml_usage_rate < 0.5 (>50% fallback)

Causes:

Model confidence threshold too high
Data patterns not in training set
Model needs retraining

Solution:

// Lower confidence threshold
let inference = MLInference::load_model_with_threshold("model.onnx", 0.65)?;

// Or retrain model with more diverse data

Slow Inference

Symptom: avg_latency_ms > 10ms

Causes:

Large batch size
Model not optimized
CPU contention

Solution:

Use ONNX graph optimization (enabled by default)
Reduce model complexity
Use quantized model (INT8)

Future Enhancements

Planned (Week 2-4)

Online Learning - Adapt model based on production data
Multi-Model Ensemble - Combine multiple models
Hardware Acceleration - GPU/TPU inference support
A/B Testing - Compare ML vs. heuristics
Model Versioning - Support multiple model versions

Research Ideas

LSTM for sequential patterns
Transformer-based selection
Reinforcement learning for adaptive selection
Transfer learning from other domains

References

ONNX Runtime: https://onnxruntime.ai/
scikit-learn: https://scikit-learn.org/
RandomForest: https://en.wikipedia.org/wiki/Random_forest
Codec Benchmarks: [F5.1.1_PERFORMANCE_METRICS.md]

Version History

v1.0 (2025-11-01): Initial ML integration
- ONNX runtime support
- Confidence-based fallback
- 92% test accuracy
- <3ms inference latency

Next Steps: Week 2 - Online learning and model optimization