Three commands

Install, index, wire it in. Claude Code now has 11 tools that operate over your code as graph + vectors + symbols, not as raw text.

# 1. Install (one-time)
curl -L https://github.com/dimensigon/heliosdb-codekb-mcp/releases/download/v0.1.0/heliosdb-codekb-mcp-linux-x86_64 \
  -o /usr/local/bin/heliosdb-codekb-mcp && chmod +x /usr/local/bin/heliosdb-codekb-mcp

# 2. Index the repo (~30 s for ~1k files; resumable)
heliosdb-codekb-mcp init --source ~/code/myrepo --mode co-located --ingest

# 3. Wire into Claude Code (.mcp.json in the repo root)
{
  "mcpServers": {
    "helios": {
      "command": "/usr/local/bin/heliosdb-codekb-mcp",
      "args": ["serve", "--source", "/home/me/code/myrepo"]
    }
  }
}

11 tools for code-shaped queries

The MCP binary owns transport and per-source KB ergonomics. The tool surface comes from the embedded engine. So the catalog is keyed on the engine version the binary embeds.

LSP-shaped (10) — symbol-aware operations, branch- and time-aware

Tool Purpose
helios_lsp_definitionGo-to-definition for a symbol at (file, line, col)
helios_lsp_referencesFind all references with caller/callee context
helios_lsp_hoverType signature + doc-comment for a symbol
helios_lsp_document_symbolsOutline of a single file (functions, types, modules)
helios_lsp_call_hierarchyUp/down call graph from a symbol, N hops
helios_lsp_rename_previewDry-run rename with diff across all references
helios_lsp_rename_applyCommit a rename atomically across the KB
helios_lsp_body_diffBody-text diff for a symbol between two KB snapshots
helios_lsp_references_diffReference-set diff between two snapshots (find new callers)
helios_ast_diffTree-sitter AST diff between two file versions

Graph-RAG (1)

Tool Purpose
helios_graphrag_searchHybrid retrieval: BM25 + (optional) vector rerank + N-hop graph expansion

Coming in v0.2.0 (engine target: Nano v3.23.x): helios_lsp_workspace_symbols, helios_lsp_implementations, helios_graphrag_explain (citation chain). Tracked in the milestone.

Three KB modes

Mode KB location Use when
co-located<source>/.helios-kb (auto-.gitignored)Single repo; KB travels with the code.
global${XDG_DATA_HOME}/helios-kb/<slug>Many independent projects on one machine.
hybridExplicit --kb <PATH>; multiple sources register against one KBMulti-repo aggregate (e.g. monorepo + sister repos).

Three ingest tiers

Picked at init --ingest or ingest time. Pilot fixture: 666 files / 18 k symbols / 115 k refs (Helios Nano source).

Tier Flag Wall time What lights up
fast (default)(none)26 sBM25 + hop-distance ranking on helios_graphrag_search
quality (blocking)--with-embeddings3 m 15 sAdds in-process FastEmbedder body vectors → paraphrase queries hit
background-quality--background-quality26 s parent + ~2 m 50 s detached childUser-wait stays at 26 s; paraphrase quality lifts when child finishes

Recommended for repos > ~1k files: --background-quality.

Code, prose, sheets — one walk

Default tier — pure-Rust, single static binary

FormatBackend
.rs .py .ts .tsx .js .go .sqltree-sitter (engine code-graph)
.md .txttree-sitter Markdown / generic text
.pdf (born-digital)pdf-extract
.docxdocx-rs
.xlsxcalamine

--features docling — requires a docling-serve endpoint

FormatBackend
Scanned / OCR PDFsDocling layout + OCR
Scientific PDFsDocling table & formula extraction
.pptx / complex OfficeDocling Office pipeline
Audio (.mp3 .wav .m4a)Docling Whisper pipeline
Images (.png .jpg .tiff)Docling OCR

One binary, five clients

The same binary serves all of them; transport is a flag on serve.

Agent Config file Transport
Claude Code.mcp.json (project) or ~/.claude/mcp_servers.json (user)stdio
Codex CLI~/.codex/config.toml [mcp_servers.helios]stdio
CursorSettings → MCP Servers (UI)HTTP (--http 127.0.0.1:8765)
Continue~/.continue/config.jsonHTTP
Aideraider --mcp-server <cmd>stdio

Why a separate binary

HeliosDB-Nano is a generic database. Baking one specific MCP transport into its binary would adapt the engine to one consumer. This crate keeps the boundary clean.

Engine stays generic

Publishable to crates.io, consumable by any downstream. No PostgreSQL wire protocol baggage, no MCP coupling, no transport opinions.

codekb-mcp owns the agent surface

MCP packaging, transport choice, per-source KB-location ergonomics, ingest scheduling. You get the engine's tool surface without dragging in vector store DDL or distributed coordination — none of which a coding agent needs.

Resume on interrupt · generated-file skip

Resume on interrupt

ingest writes <kb_dir>/.ingest-state.json at each phase transition (walk → code_index → graph_rag). Killed mid-flight (Ctrl-C, OOM, reboot)? Next ingest reads the file at startup and skips already-completed phases. Per-file resume inside code_index is the engine's content-hash gate. Surfaced by status --source X as ingest resume : interrupted at phase = … until cleared.

Generated-file skip

Two layers, both honored during the walk:

  • @generated content-marker scan — first 4 KiB (Facebook / Google / Bazel / Go convention).
  • <root>/.gitattributes linguist-generated globs — patterns flagged with linguist-generated, linguist-generated=true, or linguist-generated=set. Long-tail coverage of *.pb.rs, codegen/**, vendored bundles.

Get the binary, query your repo

Apache 2.0, single-file install, embeds HeliosDB-Nano v3.22.2. Linux x86_64 today, macOS x86_64 next.