Conversational BI
Conversational BI — Ask Your Database in English, Get SQL + Answers
Crate: heliosdb-nl/crates/conversational-bi
Status: Foundation Complete — core engine, session manager, NL2SQL pipeline, semantic cache, BIRD benchmark integration are implemented. LLM API integrations (OpenAI / Anthropic / local) and production hardening are listed as in-flight in the crate README.
ARR: World-first $60M (per audit)
UVP
The fastest BI tool in the world is the one your sales lead doesn’t have to learn. The Full edition ships a Conversational BI engine — multi-turn natural language → SQL → results → English explanation, with reference resolution (“which region had the highest?”), semantic caching for repeated questions, schema augmentation, self-correction on bad SQL, and a target of 95% accuracy on the BIRD benchmark vs. SOTA’s 68-80%. Engine is production. Specific LLM provider wiring is in progress per the source crate.
Prerequisites
- HeliosDB Full v7.x+ with the
heliosdb-nlworkspace built in. - An LLM endpoint — OpenAI, Anthropic, Cohere, or a local model (Ollama / ONNX).
- A schema worth asking questions about.
- About 20 minutes.
1. The One-Engine Surface
From the conversational-bi README:
use heliosdb_conversational_bi::{ConversationalBiEngine, ConversationalConfig};
#[tokio::main]async fn main() -> Result<(), Box<dyn std::error::Error>> { let config = ConversationalConfig::default(); let engine = ConversationalBiEngine::new(config).await?;
let session_id = engine.create_session("user123", "sales_db").await?;
let response = engine .process_query(session_id, "Show me top 10 customers by revenue in 2024") .await?;
println!("SQL: {}", response.sql.unwrap()); println!("Explanation: {}", response.explanation); println!("Confidence: {:.1}%", response.confidence * 100.0);
// Follow-up uses session context let response = engine .process_query(session_id, "Which region had the highest?") .await?; Ok(())}The whole pipeline — session, context tracking, NL→SQL, validation, execution, explanation — is fronted by ConversationalBiEngine::process_query. Everything else is configuration.
2. The Eight-Stage Pipeline
From the README’s architecture diagram, every process_query call walks:
- Session Management — load conversation history.
- Reference Resolution — resolve “that”, “it”, pronouns, prior entities.
- Semantic Cache Lookup — embedding similarity vs. recent queries.
- Schema Augmentation — inject relevant table/column descriptions.
- SQL Generation — LLM call with few-shot examples.
- Validation & Self-Correction — parse, check, retry on syntax error.
- Query Execution — run against HeliosDB.
- Explanation Generation — natural language summary + suggestions.
Each stage is a public module — see Section 7 for direct module access.
3. Configure Your Model
use heliosdb_conversational_bi::{ConversationalConfig, ModelConfig, ModelProvider};
let config = ConversationalConfig { max_turns: 20, context_window_tokens: 8000, primary_model: ModelConfig { provider: ModelProvider::OpenAI, model_name: "gpt-4".to_string(), api_key: Some(std::env::var("OPENAI_API_KEY")?), max_tokens: 2000, temperature: 0.1, ..Default::default() }, enable_query_cache: true, enable_schema_cache: true, max_concurrent_sessions: 1000, session_timeout_minutes: 30, ..Default::default()};temperature: 0.1 is deliberate — you want SQL, not creativity. max_concurrent_sessions: 1000 is a hard cap; the manager evicts on LRU when you hit it.
4. A Three-Turn Dialog
let session = engine.create_session("alice", "sales_db").await?;
// Turn 1let r1 = engine.process_query( session, "Show me top 10 customers by revenue in 2024").await?;// SQL: SELECT customer_id, SUM(amount) AS revenue// FROM orders// WHERE EXTRACT(YEAR FROM order_date) = 2024// GROUP BY customer_id// ORDER BY revenue DESC LIMIT 10;
// Turn 2 — "highest" refers to Turn 1's customerslet r2 = engine.process_query( session, "Which region had the highest?").await?;// SQL: SELECT c.region, SUM(o.amount) AS revenue// FROM orders o JOIN customers c ON c.id = o.customer_id// WHERE EXTRACT(YEAR FROM o.order_date) = 2024// GROUP BY c.region// ORDER BY revenue DESC LIMIT 1;
// Turn 3 — "that" refers to the winning regionlet r3 = engine.process_query( session, "Break that down by product category").await?;// SQL: SELECT p.category, SUM(o.amount) AS revenue// FROM orders o// JOIN customers c ON c.id = o.customer_id// JOIN products p ON p.id = o.product_id// WHERE c.region = '<resolved-region>'// AND EXTRACT(YEAR FROM o.order_date) = 2024// GROUP BY p.category// ORDER BY revenue DESC;Reference resolution is what makes turns 2 and 3 work. The context_tracker module records entities mentioned in each response (column lists, filter values, results) and resolves “highest”, “that”, “it”, “those” against them.
5. Inspect Conversation State
let history = engine.get_history(session_id).await?;println!("Conversation has {} turns", history.len());
for turn in history { println!("Q: {}", turn.user_query); println!("A: {}", turn.sql.as_deref().unwrap_or("(error)"));}End the session when you’re done — sessions are bounded by max_concurrent_sessions:
engine.delete_session(session_id).await?;6. Performance Targets
From the README:
| Metric | Target |
|---|---|
| Accuracy (BIRD) | 95%+ execution accuracy |
| Latency p50 | <2s |
| Latency p99 | <5s |
| Throughput | 100+ QPS |
| Cache hit rate | 80%+ semantic similarity |
The cache hit rate is interesting: similar (not identical) queries hit because the cache is embedding-based, not string-based. “Top 10 customers by revenue” and “Show me my biggest customers by money spent” map close in embedding space and reuse the same plan.
7. The Modules (If You Want to Dig)
session_manager conversation lifecyclecontext_tracker entity tracking + reference resolutionnl2sql_engine NL → SQL with validationschema_augmenter annotates schema with semantic infomodel_router routes to OpenAI / Anthropic / localexplanation_engine SQL → English explanation + suggestionssemantic_cache embedding-based similarity cacheYou can use any of them standalone — e.g. nl2sql_engine without the conversational shell — for one-shot text-to-SQL.
8. Benchmark Yourself on BIRD
The crate ships full BIRD (BigBench for Information Retrieval in Databases) integration:
# Inside the conversational-bi crate./scripts/download_bird.sh./scripts/validate_bird.py ./data/bird
# Quick run (10 examples)cargo test test_bird_quick_benchmark -- --ignored --nocapture
# Fullcargo test test_bird_full_benchmark -- --ignored --nocapture
# By difficultyBIRD_DIFFICULTY=simple cargo test test_bird_by_difficulty_simple -- --ignored --nocapture
# SubsetBIRD_SUBSET=50 cargo run --example bird_benchmark_demoSee BENCHMARKING.md for full instructions.
9. Honest Status
The README’s “Development Status” section says it plainly:
- Done: core architecture and types, session management, context tracking with reference resolution, NL2SQL engine with validation, schema augmentation, model routing scaffolding, explanation engine, semantic caching, comprehensive error handling, unit tests, BIRD dataset integration (complete).
- TODO (per the crate README): LLM API integrations (OpenAI / Anthropic / Cohere) wired through, local model support (Ollama / ONNX), actual schema introspection, embedding models for semantic cache, few-shot example store, Spider dataset integration, performance optimizations, production hardening.
In practice that means: build the engine, run the pipeline, but bring your own LLM client adapter until the upstream wires complete in a future release. The trait surface is stable.
Where Next
- federated-learning.md — train models across organizations without sharing data.
- Sister crates in
heliosdb-nl/crates/:nl2sql,nl2graph(Cypher/GQL),nl-explorer,nl-schema,semantic. - Source:
README.md,README_LLM_INTEGRATION.md,BENCHMARKING.md.