Skip to content

Federated Learning

Federated Learning — Train Across 100+ Nodes Without Sharing Raw Data

Crate: heliosdb-federation/crates/federated-learning Modules: 27 — including coordinator, aggregator, trainer, worker, privacy, homomorphic_encryption, smpc, zkp, compliance_gdpr, compliance_hipaa, vertical_fl, transfer_learning Status: “Production-Ready” per docs/FEDERATED_LEARNING.md. 300+ node scalability tests included.


UVP

Centralized training is where compliance officers say no. The Full edition ships a federated learning platform that lets you train ML models across 100+ database nodes — each on its own data, in its own region, under its own jurisdiction — and only model updates ever cross the network. 8+ aggregation strategies (FedAvg, FedProx, FedYogi, FedAdam, weighted, Byzantine-robust, secure aggregation, hierarchical), built-in differential privacy with budget tracking, GDPR + HIPAA compliance modules, plus optional homomorphic encryption and zero-knowledge proofs when “differential privacy” isn’t enough on the regulator’s checklist.


Prerequisites

  • A coordinator host plus ≥3 worker nodes (more is more interesting).
  • Local training data on each worker — the whole point is that it doesn’t move.
  • A common model architecture all workers can run.
  • About 30 minutes.

1. Engine, Coordinator, Worker

The library is built around three layers:

LayerTypeJob
EngineFederatedLearningEngineTop-level facade, owns models, manages coordinators and workers
CoordinatorCoordinatorPer-model: runs rounds, selects nodes, checks convergence
WorkerFederatedWorkerPer-node: trains locally, ships updates, receives global model

From src/lib.rs:

use heliosdb_federated_learning::{Config, FederatedLearningEngine};
use heliosdb_federated_learning::aggregator::AggregationStrategy;
use heliosdb_federated_learning::coordinator::RoundConfig;
let config = Config {
aggregation_strategy: AggregationStrategy::FedAvg,
round_config: RoundConfig::default(),
enable_privacy: false,
privacy_config: None,
};
let engine = FederatedLearningEngine::new(config).await?;

2. Register a Model

let metadata = engine.register_model(
"fraud_detector".to_string(),
"linear".to_string(), // architecture name
vec![0.1, 0.2, 0.3, /* ... */], // initial parameters
).await?;
println!("Model registered: {} (version {})", metadata.id, metadata.version);

The architecture string is opaque to the coordinator — it’s what FederatedWorker uses to know how to instantiate the local model. Each worker must have a matching architecture handler.


3. Pick an Aggregation Strategy

From the docs:

StrategyWhen
FedAvgDefault. Simple averaging, IID data. (McMahan et al. 2017)
FedProxHeterogeneous (non-IID) data. Adds proximal term. (Li et al. 2020)
FedYogiAdaptive optimization with Yogi updates. (Reddi et al. 2021)
FedAdamAdaptive optimization with Adam updates.
WeightedAverageWeight by sample count per worker.
use heliosdb_federated_learning::aggregator::AggregationStrategy;
let config = Config {
aggregation_strategy: AggregationStrategy::FedProx,
..Default::default()
};

For Byzantine-robust deployments (untrusted workers), use the secure aggregator instead — see Section 8.


4. Add Differential Privacy

use heliosdb_federated_learning::privacy::DifferentialPrivacyConfig;
let dp = DifferentialPrivacyConfig {
epsilon: 1.0, // privacy budget per round
delta: 1e-5, // failure probability
clip_norm: 1.0, // gradient clipping
noise_multiplier: 1.1, // Gaussian noise scale
..Default::default()
};
let config = Config {
aggregation_strategy: AggregationStrategy::FedAvg,
enable_privacy: true,
privacy_config: Some(dp),
..Default::default()
};

The privacy::PrivacyManager tracks budget consumption per round. When you exhaust the budget, the coordinator stops the run rather than silently degrade the guarantee.


5. Start Training

engine.start_training("fraud_detector".to_string()).await?;

The coordinator now:

  1. Selects a subset of workers per round (NodeSelectionStrategy).
  2. Ships the current global model.
  3. Each worker trains locally.
  4. Workers ship parameter updates back.
  5. The aggregator combines updates per the strategy.
  6. New global model → next round.

Workers see engine.start_training only as “fetch global, train local, push update”. The orchestration is the coordinator’s job.


6. Compress the Updates

Bandwidth is usually the bottleneck. The crate ships several compression strategies:

use heliosdb_federated_learning::compression::{CompressionStrategy, ModelCompressor};
let compressor = ModelCompressor::new(CompressionStrategy::TopK { k: 1000 });
// or Quantization { bits: 8 }
// or Sparsification { threshold: 0.01 }

Top-K and quantization can shrink updates 10-100x with minor accuracy loss. The aggregator deserializes any compressed update transparently.


7. Hierarchical Aggregation for 200+ Nodes

For very large fleets, flat aggregation overwhelms the coordinator. The hierarchical_aggregation module groups workers into clusters with intermediate aggregators:

use heliosdb_federated_learning::hierarchical_aggregation;
// see module docs for cluster/region wiring

This is the same pattern Google uses for Gboard. The crate’s scalability_tests module includes 300+-node validation — see Section 11.


8. Byzantine-Robust + Secure Aggregation

When workers can’t be trusted (open consortium, edge devices, untrusted partners):

use heliosdb_federated_learning::secure_aggregation::{
ByzantineRobustAggregator, MaliciousDetector,
};
use heliosdb_federated_learning::advanced_security::{
AdvancedSecuritySystem, SecurityConfig,
};
let aggregator = ByzantineRobustAggregator::new(/* config */);
let detector = MaliciousDetector::new(/* config */);

Plus advanced_security adds backdoor detection and certified robustness. Combine with secure aggregation (secure_aggregation module) to keep individual updates encrypted from the coordinator itself.


9. Heavyweight Privacy: Homomorphic Encryption + ZKP + SMPC

When DP isn’t enough:

use heliosdb_federated_learning::homomorphic_encryption::HomomorphicEncryption;
use heliosdb_federated_learning::smpc::{SMPCConfig, ShamirSecretSharing};
use heliosdb_federated_learning::zkp::{ZKPConfig, ZKPSystem};
// HE: aggregate on encrypted updates
// SMPC: secret-share parameters across n participants
// ZKP: prove update validity without revealing it

These are heavy — orders of magnitude slower than DP — but they’re there when the threat model demands it. See docs/FEDERATED_LEARNING.md for the full crypto chapter.


10. GDPR + HIPAA Compliance

Compliance modules are first-class:

use heliosdb_federated_learning::compliance_gdpr::GdprComplianceManager;
use heliosdb_federated_learning::compliance_hipaa::HipaaComplianceManager;
let gdpr = GdprComplianceManager::new(/* config */);
let report = gdpr.generate_compliance_report().await?;
let hipaa = HipaaComplianceManager::new(/* config */);
let phi_audit = hipaa.audit_phi_access().await?;

These produce the artefacts auditors actually want — consent records, data category logs, audit entries, exportable reports. Per the lib.rs comments, both modules landed in Week 11 of the build-out and cover the validation surface.


11. Use Cases (From Source)

Per docs/FEDERATED_LEARNING.md:

  1. Multi-Institution Healthcare — train diagnostic models across hospitals without sharing patient data.
  2. Financial Fraud Detection — collaborative across banks, customer privacy preserved.
  3. IoT/Edge AI — train across edge devices with limited connectivity.
  4. Cross-Organization Analytics — collaborative analytics maintaining competitive secrecy.
  5. Regulatory Compliance — GDPR/HIPAA-compliant ML.

12. Inference & Serving

After training, expose the model via the built-in serving API:

use heliosdb_federated_learning::model_serving::{
InferenceRequest, ModelServingEngine, ServingConfig,
};
let serving = ModelServingEngine::new(ServingConfig::default()).await?;
let response = serving.infer(InferenceRequest { /* ... */ }).await?;

Supports REST and gRPC; A/B testing for model versions is built in (ABTestConfig).


13. SQL Interface

A SQL surface is registered via sql_interface::SQLFunctionRegistry:

SELECT predict('fraud_detector', features)
FROM transactions
WHERE amount > 1000;

This dispatches into the FederatedLearningAPI for in-database inference.


Where Next