Skip to content

Cognitive Agents — Autonomous Database Management

UVP

Five autonomous agents — Performance Optimizer, Index Advisor, Query Tuner, Schema Manager, and Security Monitor / Self-Healer — manage HeliosDB Full without a human in the loop 98% of the time. Each one runs a full Observe-Orient-Decide-Act cycle with GOAP planning, Q-learning, and a 5-layer safety framework (sandbox, confidence ≥0.95, human-in-loop fallback, automatic rollback, full audit trail). Patent claims cover GOAP + RL + 5-layer safety for databases — a 2-3 year technical lead. Cut DBA load 60-80% on Fortune-500 workloads with zero rollback regressions in production tests.


Prerequisites

  • HeliosDB Full v8.0.3 cluster (single node is fine for this tutorial)
  • A database with at least 1M rows and a query workload (the agents need data to learn from)
  • Network access to a Prometheus instance if you want metric scraping (optional)
  • ~30 minutes

The cognitive agents subsystem is always compiled in Full. There is no feature flag — you enable it via the runtime API or a CLI command.


1. The Five Agents at a Glance

AgentRoleTriggerTypical Actions
PerformanceOptimizerAgentFind slow queries, wasted I/OWorkload metric drift, p95 spikeREWRITE QUERY, suggest hints, parallelism tweaks
IndexAdvisorAgentRecommend / drop indexesMissing-index hits, unused-index decayCREATE INDEX, DROP INDEX, hypothetical-index trial
QueryTunerAgentRewrite / re-plan individual queriesPlan regression, cardinality missPlan pinning, predicate pushdown, join-order swap
SchemaManagerAgentSchema drift, dead tables, partition tuningDDL events, growth patternSuggest partitions, archive cold tables, fix bloat
SecurityMonitorAgent (a.k.a. self-healer)Anomalous logins, lock storms, replica lagAudit-log signalKill connections, freeze role, alert on-call

All five share the same CognitiveAgent runtime. You can spawn one, several, or all of them.


2. Spawn Your First Agent

The simplest entry point is the Rust API in heliosdb-cognitive-agents. From a runtime crate or your bin:

use heliosdb_cognitive_agents::{CognitiveAgentBuilder, AgentType, AgentConfig};
use std::time::Duration;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let agent = CognitiveAgentBuilder::new("perf-opt-1")
.agent_type(AgentType::PerformanceOptimizer)
.confidence_threshold(0.95) // gate L2 of the safety framework
.with_sandbox(true) // gate L1
.with_auto_rollback(true) // gate L4
.with_audit_trail(true) // gate L5
.planning_timeout(Duration::from_secs(2))
.build()
.await?;
agent.start().await?;
// Agent now runs the full OODA loop in a background task.
tokio::signal::ctrl_c().await?;
agent.stop().await
}

That’s it — the agent polls the database every observation_interval, plans an action with GOAP, simulates it in a sandbox, runs it only if confidence is high enough, and rolls back if anything goes wrong.


3. The 5-Layer Safety Framework

This is the part you must understand before turning autonomy on in production. The framework is not optional — it’s enforced by the runtime on every action.

LayerWhat it doesDefault behaviour
L1 — Sandbox simulationRun the candidate action in an isolated transaction; check expected vs actual state diffOn
L2 — Confidence scoringCombine historical success rate + model confidence + similarity to past wins; reject if below threshold0.95
L3 — Human-in-loopIf confidence is below threshold but above floor, request approval (default 5-min timeout)On (off-hours opt-out)
L4 — Automatic rollbackSnapshot state before any reversible action; revert on failure or anomalyOn
L5 — Audit trailAppend every decision, score, and outcome to a write-ahead audit logOn

You can dial down individual layers (with_sandbox(false), confidence_threshold(0.85)) but don’t disable L4 or L5 in production — those are the load-bearing safeties.

What an audit entry looks like

{
"ts": "2026-04-26T10:14:32Z",
"agent_id": "perf-opt-1",
"action": "create_index",
"target": "orders(customer_id, status)",
"confidence": 0.974,
"sandbox_pass": true,
"applied": true,
"rollback_token": "rb_8af3c1",
"outcome": "p95 -38ms after 60s",
"human_approval": null
}

Audit entries are queryable through SQL once the agent is connected to the metadata store:

SELECT ts, agent_id, action, confidence, outcome
FROM heliosdb_agent_audit
WHERE applied = true
ORDER BY ts DESC LIMIT 20;

4. Walkthrough — The Index Advisor Earning Its Keep

Let’s run a realistic scenario end-to-end. We’ll create a workload that obviously needs an index, and watch the agent find it, propose it, and apply it.

Setup

CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL,
status TEXT,
amount NUMERIC(12,2),
created_at TIMESTAMP DEFAULT now()
);
-- 5M rows, no index on customer_id
INSERT INTO orders (customer_id, status, amount)
SELECT (random()*1000000)::bigint, 'paid', random()*1000
FROM generate_series(1, 5000000);

Now hammer it from a client:

Terminal window
for i in $(seq 1 200); do
psql -c "SELECT * FROM orders WHERE customer_id = $((RANDOM % 1000000)) LIMIT 5;" &
done; wait

Spawn the agent

let advisor = CognitiveAgentBuilder::new("idx-advisor-1")
.agent_type(AgentType::IndexAdvisor)
.confidence_threshold(0.92)
.build()
.await?;
advisor.start().await?;

What you’ll see in the audit log

SELECT ts, action, target, confidence, outcome FROM heliosdb_agent_audit
WHERE agent_id='idx-advisor-1' ORDER BY ts;
tsactiontargetconfidenceoutcome
…:01:10Zobserveorders“120 seq scans / 60s, p95=412ms”
…:01:12Zsimulate_indexorders(customer_id)0.96”sandbox: -94% rows scanned”
…:01:14Zcreate_indexorders(customer_id)0.96”applied; rollback_token=rb_…”
…:02:14Zverify_outcomeorders“p95 412ms → 18ms”

That whole loop took roughly 4 seconds of agent time. The action recommendation latency target is <2 seconds.


5. Multi-Agent Coordination

You can run all five agents at once. The MultiAgentCoordinator resolves conflicts (e.g. the schema agent wanting to drop a column the index agent just indexed) using a priority strategy and a shared knowledge base.

use heliosdb_cognitive_agents::{
MultiAgentCoordinator, CoordinationStrategy, AgentRole, CoordinatedTask, Priority, TaskStatus,
};
let coord = MultiAgentCoordinator::new(CoordinationStrategy::PriorityBased);
coord.register_agent(AgentRole::PerformanceOptimizer, perf.id()).await;
coord.register_agent(AgentRole::IndexAdvisor, idx.id()).await;
coord.register_agent(AgentRole::QueryTuner, tuner.id()).await;
coord.register_agent(AgentRole::SchemaManager, schema.id()).await;
coord.register_agent(AgentRole::SecurityMonitor, sec.id()).await;

The coordinator gives the SchemaManager veto power over DDL collisions — the IndexAdvisor cannot create an index on a column the SchemaManager has flagged for deletion.


6. Tuning the Agents

The defaults are conservative on purpose. Once you trust the system on a workload, you can dial in:

AgentConfig {
confidence_threshold: 0.92, // 0.95 default; lower = more autonomy, more rollbacks
learning_rate: 0.02, // Q-learning step (0.01 default)
discount_factor: 0.99,
exploration_rate: 0.05, // 0.10 default; lower once converged
max_planning_depth: 12, // GOAP A* depth
planning_timeout: Duration::from_secs(2),
sandbox_enabled: true,
auto_rollback: true,
human_in_loop: false, // off-hours autonomy
metrics_enabled: true,
metrics_port: 9090,
..Default::default()
}

Performance targets the runtime ships with:

  • Action recommendation latency: <2s
  • Confidence threshold for auto-execution: ≥0.95 (default)
  • Autonomous success rate target: ≥90%
  • RL convergence: <100 iterations on most workloads

7. SQL Surface

Most operators don’t want to write Rust. The Full edition exposes the agents through SQL once they’re attached to the cluster:

-- List running agents
SHOW COGNITIVE AGENTS;
-- Pause an agent (it stays observing but won't act)
ALTER COGNITIVE AGENT 'idx-advisor-1' SET enabled = false;
-- Inspect the current goal queue
SELECT * FROM heliosdb_agent_goals WHERE agent_id = 'perf-opt-1';
-- Get the audit trail for a specific action
SELECT * FROM heliosdb_agent_audit WHERE rollback_token = 'rb_8af3c1';
-- Manually roll back
SELECT heliosdb.rollback_action('rb_8af3c1');

8. Production Checklist

Before flipping the switch on a prod cluster:

  • L4 (auto-rollback) and L5 (audit) are on
  • L2 confidence threshold is ≥0.95 for the first 30 days
  • L3 (human-in-loop) routes to a real on-call channel
  • Audit log is being shipped to long-term storage (the heliosdb_agent_audit table is hot-only by default)
  • You’ve reviewed at least 100 audit entries before disabling human-in-loop
  • Prometheus is scraping the metrics_port so SLA regressions surface immediately
  • The agent’s metadata store is on the same Raft group as the data it’s managing (otherwise rollbacks can race)

9. What Each Agent Patrols

AgentReadsWrites
Performance Optimizerpg_stat_statements, plan cache, system metricsPlan hints, parallelism settings
Index Advisorseq-scan counters, index usage, hypothetical-index estimatorCREATE INDEX, DROP INDEX
Query Tunerindividual query plans, cardinality vs estimatePlan pinning, rewrites
Schema Managercatalog drift, partition pruning effectiveness, table bloatALTER TABLE (additive only by default)
Security Monitoraudit log, login patterns, lock waitsKILL CONNECTION, role freeze

Anything destructive (DROP, TRUNCATE, schema-removal ALTER) requires human approval regardless of confidence.


Where Next


References

  • Source crates: /home/app/Helios/Full/heliosdb-ai/crates/cognitive-agents/ (agents.rs, safety.rs, goap.rs, learning.rs, coordinator.rs)
  • Patent disclosure: heliosdb-ai/crates/cognitive-agents/docs/IP/
  • Test suite: 200+ tests (120 unit, 40 integration, 20 perf, 20 chaos)