Skip to content

Conversational BI

Conversational BI — Ask Your Database in English, Get SQL + Answers

Crate: heliosdb-nl/crates/conversational-bi Status: Foundation Complete — core engine, session manager, NL2SQL pipeline, semantic cache, BIRD benchmark integration are implemented. LLM API integrations (OpenAI / Anthropic / local) and production hardening are listed as in-flight in the crate README. ARR: World-first $60M (per audit)


UVP

The fastest BI tool in the world is the one your sales lead doesn’t have to learn. The Full edition ships a Conversational BI engine — multi-turn natural language → SQL → results → English explanation, with reference resolution (“which region had the highest?”), semantic caching for repeated questions, schema augmentation, self-correction on bad SQL, and a target of 95% accuracy on the BIRD benchmark vs. SOTA’s 68-80%. Engine is production. Specific LLM provider wiring is in progress per the source crate.


Prerequisites

  • HeliosDB Full v7.x+ with the heliosdb-nl workspace built in.
  • An LLM endpoint — OpenAI, Anthropic, Cohere, or a local model (Ollama / ONNX).
  • A schema worth asking questions about.
  • About 20 minutes.

1. The One-Engine Surface

From the conversational-bi README:

use heliosdb_conversational_bi::{ConversationalBiEngine, ConversationalConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = ConversationalConfig::default();
let engine = ConversationalBiEngine::new(config).await?;
let session_id = engine.create_session("user123", "sales_db").await?;
let response = engine
.process_query(session_id, "Show me top 10 customers by revenue in 2024")
.await?;
println!("SQL: {}", response.sql.unwrap());
println!("Explanation: {}", response.explanation);
println!("Confidence: {:.1}%", response.confidence * 100.0);
// Follow-up uses session context
let response = engine
.process_query(session_id, "Which region had the highest?")
.await?;
Ok(())
}

The whole pipeline — session, context tracking, NL→SQL, validation, execution, explanation — is fronted by ConversationalBiEngine::process_query. Everything else is configuration.


2. The Eight-Stage Pipeline

From the README’s architecture diagram, every process_query call walks:

  1. Session Management — load conversation history.
  2. Reference Resolution — resolve “that”, “it”, pronouns, prior entities.
  3. Semantic Cache Lookup — embedding similarity vs. recent queries.
  4. Schema Augmentation — inject relevant table/column descriptions.
  5. SQL Generation — LLM call with few-shot examples.
  6. Validation & Self-Correction — parse, check, retry on syntax error.
  7. Query Execution — run against HeliosDB.
  8. Explanation Generation — natural language summary + suggestions.

Each stage is a public module — see Section 7 for direct module access.


3. Configure Your Model

use heliosdb_conversational_bi::{ConversationalConfig, ModelConfig, ModelProvider};
let config = ConversationalConfig {
max_turns: 20,
context_window_tokens: 8000,
primary_model: ModelConfig {
provider: ModelProvider::OpenAI,
model_name: "gpt-4".to_string(),
api_key: Some(std::env::var("OPENAI_API_KEY")?),
max_tokens: 2000,
temperature: 0.1,
..Default::default()
},
enable_query_cache: true,
enable_schema_cache: true,
max_concurrent_sessions: 1000,
session_timeout_minutes: 30,
..Default::default()
};

temperature: 0.1 is deliberate — you want SQL, not creativity. max_concurrent_sessions: 1000 is a hard cap; the manager evicts on LRU when you hit it.


4. A Three-Turn Dialog

let session = engine.create_session("alice", "sales_db").await?;
// Turn 1
let r1 = engine.process_query(
session,
"Show me top 10 customers by revenue in 2024"
).await?;
// SQL: SELECT customer_id, SUM(amount) AS revenue
// FROM orders
// WHERE EXTRACT(YEAR FROM order_date) = 2024
// GROUP BY customer_id
// ORDER BY revenue DESC LIMIT 10;
// Turn 2 — "highest" refers to Turn 1's customers
let r2 = engine.process_query(
session,
"Which region had the highest?"
).await?;
// SQL: SELECT c.region, SUM(o.amount) AS revenue
// FROM orders o JOIN customers c ON c.id = o.customer_id
// WHERE EXTRACT(YEAR FROM o.order_date) = 2024
// GROUP BY c.region
// ORDER BY revenue DESC LIMIT 1;
// Turn 3 — "that" refers to the winning region
let r3 = engine.process_query(
session,
"Break that down by product category"
).await?;
// SQL: SELECT p.category, SUM(o.amount) AS revenue
// FROM orders o
// JOIN customers c ON c.id = o.customer_id
// JOIN products p ON p.id = o.product_id
// WHERE c.region = '<resolved-region>'
// AND EXTRACT(YEAR FROM o.order_date) = 2024
// GROUP BY p.category
// ORDER BY revenue DESC;

Reference resolution is what makes turns 2 and 3 work. The context_tracker module records entities mentioned in each response (column lists, filter values, results) and resolves “highest”, “that”, “it”, “those” against them.


5. Inspect Conversation State

let history = engine.get_history(session_id).await?;
println!("Conversation has {} turns", history.len());
for turn in history {
println!("Q: {}", turn.user_query);
println!("A: {}", turn.sql.as_deref().unwrap_or("(error)"));
}

End the session when you’re done — sessions are bounded by max_concurrent_sessions:

engine.delete_session(session_id).await?;

6. Performance Targets

From the README:

MetricTarget
Accuracy (BIRD)95%+ execution accuracy
Latency p50<2s
Latency p99<5s
Throughput100+ QPS
Cache hit rate80%+ semantic similarity

The cache hit rate is interesting: similar (not identical) queries hit because the cache is embedding-based, not string-based. “Top 10 customers by revenue” and “Show me my biggest customers by money spent” map close in embedding space and reuse the same plan.


7. The Modules (If You Want to Dig)

session_manager conversation lifecycle
context_tracker entity tracking + reference resolution
nl2sql_engine NL → SQL with validation
schema_augmenter annotates schema with semantic info
model_router routes to OpenAI / Anthropic / local
explanation_engine SQL → English explanation + suggestions
semantic_cache embedding-based similarity cache

You can use any of them standalone — e.g. nl2sql_engine without the conversational shell — for one-shot text-to-SQL.


8. Benchmark Yourself on BIRD

The crate ships full BIRD (BigBench for Information Retrieval in Databases) integration:

Terminal window
# Inside the conversational-bi crate
./scripts/download_bird.sh
./scripts/validate_bird.py ./data/bird
# Quick run (10 examples)
cargo test test_bird_quick_benchmark -- --ignored --nocapture
# Full
cargo test test_bird_full_benchmark -- --ignored --nocapture
# By difficulty
BIRD_DIFFICULTY=simple cargo test test_bird_by_difficulty_simple -- --ignored --nocapture
# Subset
BIRD_SUBSET=50 cargo run --example bird_benchmark_demo

See BENCHMARKING.md for full instructions.


9. Honest Status

The README’s “Development Status” section says it plainly:

  • Done: core architecture and types, session management, context tracking with reference resolution, NL2SQL engine with validation, schema augmentation, model routing scaffolding, explanation engine, semantic caching, comprehensive error handling, unit tests, BIRD dataset integration (complete).
  • TODO (per the crate README): LLM API integrations (OpenAI / Anthropic / Cohere) wired through, local model support (Ollama / ONNX), actual schema introspection, embedding models for semantic cache, few-shot example store, Spider dataset integration, performance optimizations, production hardening.

In practice that means: build the engine, run the pipeline, but bring your own LLM client adapter until the upstream wires complete in a future release. The trait surface is stable.


Where Next