Schema Derivation Quick Reference
Schema Derivation Quick Reference
Quick Overview
Status: COMPLETE Files Modified: 3 (/home/claude/HeliosDB Nano/src/types.rs, /home/claude/HeliosDB Nano/src/sql/type_inference.rs, /home/claude/HeliosDB Nano/src/sql/logical_plan.rs) Tests Added: 25+ Priority: P1 (High)
API Usage
Using Type Inference
use crate::sql::type_inference::TypeInference;
// 1. Infer data type from expressionlet data_type = expr.infer_type(&schema)?;
// 2. Infer nullabilitylet nullable = expr.infer_nullable(&schema);
// 3. Create complete Column definition (type + nullability)let column = expr.to_column("result_name".to_string(), &schema);Schema Helpers
// Merge two schemas (for JOIN)let combined_schema = left_schema.merge(&right_schema);
// Project to subset of columnslet projected_schema = schema.project(&[0, 2, 4]);Type Inference Cheat Sheet
Aggregate Functions
COUNT(*) → Int8, nullable: falseCOUNT(col) → Int8, nullable: trueSUM(col) → <same>, nullable: trueAVG(col) → Float8, nullable: trueMIN(col) → <same>, nullable: trueMAX(col) → <same>, nullable: trueScalar Functions
LENGTH(str) → Int8UPPER/LOWER(s) → TextCONCAT(...) → TextABS(n) → <same>ROUND(n) → Float8NOW() → TimestampCURRENT_DATE → DateArithmetic Type Coercion
Int2 + Int4 → Int4Int4 + Int8 → Int8Int4 + Float8 → Float8Float4 + Int8 → Float8Any + Numeric → NumericBinary Operators
=, <, >, <=, >=, != → Boolean (never nullable)AND, OR → Boolean (never nullable)+, -, *, /, % → Numeric (nullable if any arg nullable)LIKE, NOT LIKE → Boolean (never nullable)<-> (L2 distance) → Float8-> (JSON get) → JSONB->> (JSON text) → Text@> (JSON contains) → BooleanSpecial Cases
IS NULL / IS NOT NULL → Boolean (never nullable)BETWEEN → Boolean (never nullable)IN / NOT IN → Boolean (never nullable)CAST → Target type (preserves nullability)CASE (no ELSE) → Result type (nullable)CASE (with ELSE) → Result type (nullable if any branch nullable)Nullability Rules
Propagation
- Binary expressions: nullable if either operand is nullable
- Unary expressions: preserve operand nullability
- Functions: nullable if any argument is nullable
Exceptions
- Comparisons always return Boolean (never nullable)
- COUNT(*) always returns Int8 (never nullable)
- IS NULL / IS NOT NULL always return Boolean (never nullable)
- BETWEEN / IN always return Boolean (never nullable)
Examples
Simple SELECT
SELECT id, price * quantity AS total FROM products;Schema:
id: Int4, nullable from table schematotal: Float8 (Int4 * Float8 → Float8), nullable if either is nullable
Aggregates
SELECT category, COUNT(*), AVG(price) FROM products GROUP BY category;Schema:
category: Text, nullable from tablecount: Int8, not nullable (COUNT(*))avg: Float8, nullable (AVG always nullable)
CASE Expression
SELECT CASE WHEN x > 0 THEN x END AS positive_x FROM t;Schema:
positive_x: Int4, nullable (no ELSE = implicit NULL)
Testing
Run Tests
# All type inference testscargo test --lib sql::type_inference
# Specific categoriescargo test test_nullable_ # Nullability testscargo test test_to_column_ # Column creation testscargo test test_aggregate_ # Aggregate testsTest Coverage
- ✓ Column type lookup
- ✓ Literal type inference
- ✓ Aggregate functions (all types)
- ✓ Arithmetic coercion
- ✓ Nullability propagation
- ✓ to_column() helper
- ✓ Complex nested expressions
- ✓ JSON operators
- ✓ Vector operators
- ✓ CASE expressions
Common Patterns
Pattern 1: Project with Type Inference
let columns = aliases.iter() .zip(exprs.iter()) .map(|(alias, expr)| expr.to_column(alias.clone(), &input_schema)) .collect();Pattern 2: Aggregate Schema
let mut columns = Vec::new();for (i, expr) in group_by.iter().enumerate() { columns.push(expr.to_column(format!("group_{}", i), &input_schema));}for (i, expr) in aggr_exprs.iter().enumerate() { columns.push(expr.to_column(format!("agg_{}", i), &input_schema));}Pattern 3: Manual Type + Nullability
let data_type = expr.infer_type(&schema)?;let nullable = expr.infer_nullable(&schema);let column = Column { name: "result".to_string(), data_type, nullable, primary_key: false,};Migration from Old Code
Before
let data_type = expr.infer_type(&schema).unwrap_or(DataType::Text);Column { name: alias.clone(), data_type, nullable: true, // Always true! primary_key: false,}After
expr.to_column(alias.clone(), &schema)Benefits:
- Proper type inference (no fallback to Text)
- Accurate nullability
- Less boilerplate
- Centralized logic
Troubleshooting
Issue: Type inference fails
Solution: Expression might reference non-existent column
match expr.infer_type(&schema) { Ok(t) => t, Err(e) => { eprintln!("Type inference failed: {}", e); DataType::Text // Fallback }}Issue: Nullability too conservative
Reason: NULL propagation is conservative by design (safe but not always optimal) Solution: Use NOT NULL constraints in table schema
Issue: Aggregate nullability unexpected
Check: COUNT(*) vs COUNT(col)
- COUNT(*) → never nullable
- COUNT(col) → nullable
Key Takeaways
- Use
to_column()for most cases - it handles both type and nullability - COUNT(*) special case - only aggregate that’s never nullable
- Comparisons never nullable - always return boolean
- CASE without ELSE - always nullable (implicit NULL)
- Type coercion - follows PostgreSQL rules (Int → Float → Numeric)
Reference Links
- Full documentation: /home/claude/HeliosDB Nano/SCHEMA_DERIVATION_IMPLEMENTATION.md
- Type inference module: /home/claude/HeliosDB Nano/src/sql/type_inference.rs
- Logical plan: /home/claude/HeliosDB Nano/src/sql/logical_plan.rs
- Types module: /home/claude/HeliosDB Nano/src/types.rs