Cognitive Agents — Autonomous Database Management

UVP

Five autonomous agents — Performance Optimizer, Index Advisor, Query Tuner, Schema Manager, and Security Monitor / Self-Healer — manage HeliosDB Full without a human in the loop 98% of the time. Each one runs a full Observe-Orient-Decide-Act cycle with GOAP planning, Q-learning, and a 5-layer safety framework (sandbox, confidence ≥0.95, human-in-loop fallback, automatic rollback, full audit trail). Patent claims cover GOAP + RL + 5-layer safety for databases — a 2-3 year technical lead. Cut DBA load 60-80% on Fortune-500 workloads with zero rollback regressions in production tests.

Prerequisites

HeliosDB Full v8.0.3 cluster (single node is fine for this tutorial)
A database with at least 1M rows and a query workload (the agents need data to learn from)
Network access to a Prometheus instance if you want metric scraping (optional)
~30 minutes

The cognitive agents subsystem is always compiled in Full. There is no feature flag — you enable it via the runtime API or a CLI command.

1. The Five Agents at a Glance

Agent	Role	Trigger	Typical Actions
`PerformanceOptimizerAgent`	Find slow queries, wasted I/O	Workload metric drift, p95 spike	`REWRITE QUERY`, suggest hints, parallelism tweaks
`IndexAdvisorAgent`	Recommend / drop indexes	Missing-index hits, unused-index decay	`CREATE INDEX`, `DROP INDEX`, hypothetical-index trial
`QueryTunerAgent`	Rewrite / re-plan individual queries	Plan regression, cardinality miss	Plan pinning, predicate pushdown, join-order swap
`SchemaManagerAgent`	Schema drift, dead tables, partition tuning	DDL events, growth pattern	Suggest partitions, archive cold tables, fix bloat
`SecurityMonitorAgent` (a.k.a. self-healer)	Anomalous logins, lock storms, replica lag	Audit-log signal	Kill connections, freeze role, alert on-call

All five share the same CognitiveAgent runtime. You can spawn one, several, or all of them.

2. Spawn Your First Agent

The simplest entry point is the Rust API in heliosdb-cognitive-agents. From a runtime crate or your bin:

use heliosdb_cognitive_agents::{CognitiveAgentBuilder, AgentType, AgentConfig};
use std::time::Duration;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let agent = CognitiveAgentBuilder::new("perf-opt-1")
        .agent_type(AgentType::PerformanceOptimizer)
        .confidence_threshold(0.95)         // gate L2 of the safety framework
        .with_sandbox(true)                 // gate L1
        .with_auto_rollback(true)           // gate L4
        .with_audit_trail(true)             // gate L5
        .planning_timeout(Duration::from_secs(2))
        .build()
        .await?;

    agent.start().await?;
    // Agent now runs the full OODA loop in a background task.
    tokio::signal::ctrl_c().await?;
    agent.stop().await
}

That’s it — the agent polls the database every observation_interval, plans an action with GOAP, simulates it in a sandbox, runs it only if confidence is high enough, and rolls back if anything goes wrong.

3. The 5-Layer Safety Framework

This is the part you must understand before turning autonomy on in production. The framework is not optional — it’s enforced by the runtime on every action.

Layer	What it does	Default behaviour
L1 — Sandbox simulation	Run the candidate action in an isolated transaction; check expected vs actual state diff	On
L2 — Confidence scoring	Combine historical success rate + model confidence + similarity to past wins; reject if below threshold	0.95
L3 — Human-in-loop	If confidence is below threshold but above floor, request approval (default 5-min timeout)	On (off-hours opt-out)
L4 — Automatic rollback	Snapshot state before any reversible action; revert on failure or anomaly	On
L5 — Audit trail	Append every decision, score, and outcome to a write-ahead audit log	On

You can dial down individual layers (with_sandbox(false), confidence_threshold(0.85)) but don’t disable L4 or L5 in production — those are the load-bearing safeties.

What an audit entry looks like

{
  "ts": "2026-04-26T10:14:32Z",
  "agent_id": "perf-opt-1",
  "action": "create_index",
  "target": "orders(customer_id, status)",
  "confidence": 0.974,
  "sandbox_pass": true,
  "applied": true,
  "rollback_token": "rb_8af3c1",
  "outcome": "p95 -38ms after 60s",
  "human_approval": null
}

Audit entries are queryable through SQL once the agent is connected to the metadata store:

SELECT ts, agent_id, action, confidence, outcome
FROM heliosdb_agent_audit
WHERE applied = true
ORDER BY ts DESC LIMIT 20;

4. Walkthrough — The Index Advisor Earning Its Keep

Let’s run a realistic scenario end-to-end. We’ll create a workload that obviously needs an index, and watch the agent find it, propose it, and apply it.

Setup

CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  customer_id BIGINT NOT NULL,
  status      TEXT,
  amount      NUMERIC(12,2),
  created_at  TIMESTAMP DEFAULT now()
);

-- 5M rows, no index on customer_id
INSERT INTO orders (customer_id, status, amount)
SELECT (random()*1000000)::bigint, 'paid', random()*1000
FROM generate_series(1, 5000000);

Now hammer it from a client:

for i in $(seq 1 200); do
  psql -c "SELECT * FROM orders WHERE customer_id = $((RANDOM % 1000000)) LIMIT 5;" &
done; wait

Spawn the agent

let advisor = CognitiveAgentBuilder::new("idx-advisor-1")
    .agent_type(AgentType::IndexAdvisor)
    .confidence_threshold(0.92)
    .build()
    .await?;
advisor.start().await?;

What you’ll see in the audit log

SELECT ts, action, target, confidence, outcome FROM heliosdb_agent_audit
WHERE agent_id='idx-advisor-1' ORDER BY ts;

ts	action	target	confidence	outcome
`…:01:10Z`	`observe`	`orders`	—	“120 seq scans / 60s, p95=412ms”
`…:01:12Z`	`simulate_index`	`orders(customer_id)`	0.96	”sandbox: -94% rows scanned”
`…:01:14Z`	`create_index`	`orders(customer_id)`	0.96	”applied; rollback_token=rb_…”
`…:02:14Z`	`verify_outcome`	`orders`	—	“p95 412ms → 18ms”

That whole loop took roughly 4 seconds of agent time. The action recommendation latency target is <2 seconds.

5. Multi-Agent Coordination

You can run all five agents at once. The MultiAgentCoordinator resolves conflicts (e.g. the schema agent wanting to drop a column the index agent just indexed) using a priority strategy and a shared knowledge base.

use heliosdb_cognitive_agents::{
    MultiAgentCoordinator, CoordinationStrategy, AgentRole, CoordinatedTask, Priority, TaskStatus,
};

let coord = MultiAgentCoordinator::new(CoordinationStrategy::PriorityBased);

coord.register_agent(AgentRole::PerformanceOptimizer, perf.id()).await;
coord.register_agent(AgentRole::IndexAdvisor,         idx.id()).await;
coord.register_agent(AgentRole::QueryTuner,           tuner.id()).await;
coord.register_agent(AgentRole::SchemaManager,        schema.id()).await;
coord.register_agent(AgentRole::SecurityMonitor,      sec.id()).await;

The coordinator gives the SchemaManager veto power over DDL collisions — the IndexAdvisor cannot create an index on a column the SchemaManager has flagged for deletion.

6. Tuning the Agents

The defaults are conservative on purpose. Once you trust the system on a workload, you can dial in:

AgentConfig {
    confidence_threshold: 0.92,         // 0.95 default; lower = more autonomy, more rollbacks
    learning_rate: 0.02,                 // Q-learning step (0.01 default)
    discount_factor: 0.99,
    exploration_rate: 0.05,              // 0.10 default; lower once converged
    max_planning_depth: 12,              // GOAP A* depth
    planning_timeout: Duration::from_secs(2),
    sandbox_enabled: true,
    auto_rollback: true,
    human_in_loop: false,                // off-hours autonomy
    metrics_enabled: true,
    metrics_port: 9090,
    ..Default::default()
}

Performance targets the runtime ships with:

Action recommendation latency: <2s
Confidence threshold for auto-execution: ≥0.95 (default)
Autonomous success rate target: ≥90%
RL convergence: <100 iterations on most workloads

7. SQL Surface

Most operators don’t want to write Rust. The Full edition exposes the agents through SQL once they’re attached to the cluster:

-- List running agents
SHOW COGNITIVE AGENTS;

-- Pause an agent (it stays observing but won't act)
ALTER COGNITIVE AGENT 'idx-advisor-1' SET enabled = false;

-- Inspect the current goal queue
SELECT * FROM heliosdb_agent_goals WHERE agent_id = 'perf-opt-1';

-- Get the audit trail for a specific action
SELECT * FROM heliosdb_agent_audit WHERE rollback_token = 'rb_8af3c1';

-- Manually roll back
SELECT heliosdb.rollback_action('rb_8af3c1');

8. Production Checklist

Before flipping the switch on a prod cluster:

L4 (auto-rollback) and L5 (audit) are on
L2 confidence threshold is ≥0.95 for the first 30 days
L3 (human-in-loop) routes to a real on-call channel
Audit log is being shipped to long-term storage (the heliosdb_agent_audit table is hot-only by default)
You’ve reviewed at least 100 audit entries before disabling human-in-loop
Prometheus is scraping the metrics_port so SLA regressions surface immediately
The agent’s metadata store is on the same Raft group as the data it’s managing (otherwise rollbacks can race)

9. What Each Agent Patrols

Agent	Reads	Writes
Performance Optimizer	`pg_stat_statements`, plan cache, system metrics	Plan hints, parallelism settings
Index Advisor	seq-scan counters, index usage, hypothetical-index estimator	`CREATE INDEX`, `DROP INDEX`
Query Tuner	individual query plans, cardinality vs estimate	Plan pinning, rewrites
Schema Manager	catalog drift, partition pruning effectiveness, table bloat	`ALTER TABLE` (additive only by default)
Security Monitor	audit log, login patterns, lock waits	`KILL CONNECTION`, role freeze

Anything destructive (DROP, TRUNCATE, schema-removal ALTER) requires human approval regardless of confidence.

Where Next

Conversational BI — let users talk to the same data the agents are managing
Quantum Optimizer Tuning — pair the QueryTuner with the quantum optimizer for an extra 100x on hard plans
Federated Learning — train the agents’ models across multiple HeliosDB clusters without sharing data
Intelligent Tiering — the storage-side counterpart, run by a sibling ML engine

References

Source crates: /home/app/Helios/Full/heliosdb-ai/crates/cognitive-agents/ (agents.rs, safety.rs, goap.rs, learning.rs, coordinator.rs)
Patent disclosure: heliosdb-ai/crates/cognitive-agents/docs/IP/
Test suite: 200+ tests (120 unit, 40 integration, 20 perf, 20 chaos)