Cognitive Agents — Autonomous Database Management
UVP
Five autonomous agents — Performance Optimizer, Index Advisor, Query Tuner, Schema Manager, and Security Monitor / Self-Healer — manage HeliosDB Full without a human in the loop 98% of the time. Each one runs a full Observe-Orient-Decide-Act cycle with GOAP planning, Q-learning, and a 5-layer safety framework (sandbox, confidence ≥0.95, human-in-loop fallback, automatic rollback, full audit trail). Patent claims cover GOAP + RL + 5-layer safety for databases — a 2-3 year technical lead. Cut DBA load 60-80% on Fortune-500 workloads with zero rollback regressions in production tests.
Prerequisites
- HeliosDB Full v8.0.3 cluster (single node is fine for this tutorial)
- A database with at least 1M rows and a query workload (the agents need data to learn from)
- Network access to a Prometheus instance if you want metric scraping (optional)
- ~30 minutes
The cognitive agents subsystem is always compiled in Full. There is no feature flag — you enable it via the runtime API or a CLI command.
1. The Five Agents at a Glance
| Agent | Role | Trigger | Typical Actions |
|---|---|---|---|
PerformanceOptimizerAgent | Find slow queries, wasted I/O | Workload metric drift, p95 spike | REWRITE QUERY, suggest hints, parallelism tweaks |
IndexAdvisorAgent | Recommend / drop indexes | Missing-index hits, unused-index decay | CREATE INDEX, DROP INDEX, hypothetical-index trial |
QueryTunerAgent | Rewrite / re-plan individual queries | Plan regression, cardinality miss | Plan pinning, predicate pushdown, join-order swap |
SchemaManagerAgent | Schema drift, dead tables, partition tuning | DDL events, growth pattern | Suggest partitions, archive cold tables, fix bloat |
SecurityMonitorAgent (a.k.a. self-healer) | Anomalous logins, lock storms, replica lag | Audit-log signal | Kill connections, freeze role, alert on-call |
All five share the same CognitiveAgent runtime. You can spawn one, several, or all of them.
2. Spawn Your First Agent
The simplest entry point is the Rust API in heliosdb-cognitive-agents. From a runtime crate or your bin:
use heliosdb_cognitive_agents::{CognitiveAgentBuilder, AgentType, AgentConfig};use std::time::Duration;
#[tokio::main]async fn main() -> anyhow::Result<()> { let agent = CognitiveAgentBuilder::new("perf-opt-1") .agent_type(AgentType::PerformanceOptimizer) .confidence_threshold(0.95) // gate L2 of the safety framework .with_sandbox(true) // gate L1 .with_auto_rollback(true) // gate L4 .with_audit_trail(true) // gate L5 .planning_timeout(Duration::from_secs(2)) .build() .await?;
agent.start().await?; // Agent now runs the full OODA loop in a background task. tokio::signal::ctrl_c().await?; agent.stop().await}That’s it — the agent polls the database every observation_interval, plans an action with GOAP, simulates it in a sandbox, runs it only if confidence is high enough, and rolls back if anything goes wrong.
3. The 5-Layer Safety Framework
This is the part you must understand before turning autonomy on in production. The framework is not optional — it’s enforced by the runtime on every action.
| Layer | What it does | Default behaviour |
|---|---|---|
| L1 — Sandbox simulation | Run the candidate action in an isolated transaction; check expected vs actual state diff | On |
| L2 — Confidence scoring | Combine historical success rate + model confidence + similarity to past wins; reject if below threshold | 0.95 |
| L3 — Human-in-loop | If confidence is below threshold but above floor, request approval (default 5-min timeout) | On (off-hours opt-out) |
| L4 — Automatic rollback | Snapshot state before any reversible action; revert on failure or anomaly | On |
| L5 — Audit trail | Append every decision, score, and outcome to a write-ahead audit log | On |
You can dial down individual layers (with_sandbox(false), confidence_threshold(0.85)) but don’t disable L4 or L5 in production — those are the load-bearing safeties.
What an audit entry looks like
{ "ts": "2026-04-26T10:14:32Z", "agent_id": "perf-opt-1", "action": "create_index", "target": "orders(customer_id, status)", "confidence": 0.974, "sandbox_pass": true, "applied": true, "rollback_token": "rb_8af3c1", "outcome": "p95 -38ms after 60s", "human_approval": null}Audit entries are queryable through SQL once the agent is connected to the metadata store:
SELECT ts, agent_id, action, confidence, outcomeFROM heliosdb_agent_auditWHERE applied = trueORDER BY ts DESC LIMIT 20;4. Walkthrough — The Index Advisor Earning Its Keep
Let’s run a realistic scenario end-to-end. We’ll create a workload that obviously needs an index, and watch the agent find it, propose it, and apply it.
Setup
CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, customer_id BIGINT NOT NULL, status TEXT, amount NUMERIC(12,2), created_at TIMESTAMP DEFAULT now());
-- 5M rows, no index on customer_idINSERT INTO orders (customer_id, status, amount)SELECT (random()*1000000)::bigint, 'paid', random()*1000FROM generate_series(1, 5000000);Now hammer it from a client:
for i in $(seq 1 200); do psql -c "SELECT * FROM orders WHERE customer_id = $((RANDOM % 1000000)) LIMIT 5;" &done; waitSpawn the agent
let advisor = CognitiveAgentBuilder::new("idx-advisor-1") .agent_type(AgentType::IndexAdvisor) .confidence_threshold(0.92) .build() .await?;advisor.start().await?;What you’ll see in the audit log
SELECT ts, action, target, confidence, outcome FROM heliosdb_agent_auditWHERE agent_id='idx-advisor-1' ORDER BY ts;| ts | action | target | confidence | outcome |
|---|---|---|---|---|
…:01:10Z | observe | orders | — | “120 seq scans / 60s, p95=412ms” |
…:01:12Z | simulate_index | orders(customer_id) | 0.96 | ”sandbox: -94% rows scanned” |
…:01:14Z | create_index | orders(customer_id) | 0.96 | ”applied; rollback_token=rb_…” |
…:02:14Z | verify_outcome | orders | — | “p95 412ms → 18ms” |
That whole loop took roughly 4 seconds of agent time. The action recommendation latency target is <2 seconds.
5. Multi-Agent Coordination
You can run all five agents at once. The MultiAgentCoordinator resolves conflicts (e.g. the schema agent wanting to drop a column the index agent just indexed) using a priority strategy and a shared knowledge base.
use heliosdb_cognitive_agents::{ MultiAgentCoordinator, CoordinationStrategy, AgentRole, CoordinatedTask, Priority, TaskStatus,};
let coord = MultiAgentCoordinator::new(CoordinationStrategy::PriorityBased);
coord.register_agent(AgentRole::PerformanceOptimizer, perf.id()).await;coord.register_agent(AgentRole::IndexAdvisor, idx.id()).await;coord.register_agent(AgentRole::QueryTuner, tuner.id()).await;coord.register_agent(AgentRole::SchemaManager, schema.id()).await;coord.register_agent(AgentRole::SecurityMonitor, sec.id()).await;The coordinator gives the SchemaManager veto power over DDL collisions — the IndexAdvisor cannot create an index on a column the SchemaManager has flagged for deletion.
6. Tuning the Agents
The defaults are conservative on purpose. Once you trust the system on a workload, you can dial in:
AgentConfig { confidence_threshold: 0.92, // 0.95 default; lower = more autonomy, more rollbacks learning_rate: 0.02, // Q-learning step (0.01 default) discount_factor: 0.99, exploration_rate: 0.05, // 0.10 default; lower once converged max_planning_depth: 12, // GOAP A* depth planning_timeout: Duration::from_secs(2), sandbox_enabled: true, auto_rollback: true, human_in_loop: false, // off-hours autonomy metrics_enabled: true, metrics_port: 9090, ..Default::default()}Performance targets the runtime ships with:
- Action recommendation latency: <2s
- Confidence threshold for auto-execution: ≥0.95 (default)
- Autonomous success rate target: ≥90%
- RL convergence: <100 iterations on most workloads
7. SQL Surface
Most operators don’t want to write Rust. The Full edition exposes the agents through SQL once they’re attached to the cluster:
-- List running agentsSHOW COGNITIVE AGENTS;
-- Pause an agent (it stays observing but won't act)ALTER COGNITIVE AGENT 'idx-advisor-1' SET enabled = false;
-- Inspect the current goal queueSELECT * FROM heliosdb_agent_goals WHERE agent_id = 'perf-opt-1';
-- Get the audit trail for a specific actionSELECT * FROM heliosdb_agent_audit WHERE rollback_token = 'rb_8af3c1';
-- Manually roll backSELECT heliosdb.rollback_action('rb_8af3c1');8. Production Checklist
Before flipping the switch on a prod cluster:
- L4 (auto-rollback) and L5 (audit) are on
- L2 confidence threshold is ≥0.95 for the first 30 days
- L3 (human-in-loop) routes to a real on-call channel
- Audit log is being shipped to long-term storage (the
heliosdb_agent_audittable is hot-only by default) - You’ve reviewed at least 100 audit entries before disabling human-in-loop
- Prometheus is scraping the
metrics_portso SLA regressions surface immediately - The agent’s metadata store is on the same Raft group as the data it’s managing (otherwise rollbacks can race)
9. What Each Agent Patrols
| Agent | Reads | Writes |
|---|---|---|
| Performance Optimizer | pg_stat_statements, plan cache, system metrics | Plan hints, parallelism settings |
| Index Advisor | seq-scan counters, index usage, hypothetical-index estimator | CREATE INDEX, DROP INDEX |
| Query Tuner | individual query plans, cardinality vs estimate | Plan pinning, rewrites |
| Schema Manager | catalog drift, partition pruning effectiveness, table bloat | ALTER TABLE (additive only by default) |
| Security Monitor | audit log, login patterns, lock waits | KILL CONNECTION, role freeze |
Anything destructive (DROP, TRUNCATE, schema-removal ALTER) requires human approval regardless of confidence.
Where Next
- Conversational BI — let users talk to the same data the agents are managing
- Quantum Optimizer Tuning — pair the QueryTuner with the quantum optimizer for an extra 100x on hard plans
- Federated Learning — train the agents’ models across multiple HeliosDB clusters without sharing data
- Intelligent Tiering — the storage-side counterpart, run by a sibling ML engine
References
- Source crates:
/home/app/Helios/Full/heliosdb-ai/crates/cognitive-agents/(agents.rs,safety.rs,goap.rs,learning.rs,coordinator.rs) - Patent disclosure:
heliosdb-ai/crates/cognitive-agents/docs/IP/ - Test suite: 200+ tests (120 unit, 40 integration, 20 perf, 20 chaos)