Skip to content

Schema Derivation Quick Reference

Schema Derivation Quick Reference

Quick Overview

Status: COMPLETE Files Modified: 3 (/home/claude/HeliosDB Nano/src/types.rs, /home/claude/HeliosDB Nano/src/sql/type_inference.rs, /home/claude/HeliosDB Nano/src/sql/logical_plan.rs) Tests Added: 25+ Priority: P1 (High)


API Usage

Using Type Inference

use crate::sql::type_inference::TypeInference;
// 1. Infer data type from expression
let data_type = expr.infer_type(&schema)?;
// 2. Infer nullability
let nullable = expr.infer_nullable(&schema);
// 3. Create complete Column definition (type + nullability)
let column = expr.to_column("result_name".to_string(), &schema);

Schema Helpers

// Merge two schemas (for JOIN)
let combined_schema = left_schema.merge(&right_schema);
// Project to subset of columns
let projected_schema = schema.project(&[0, 2, 4]);

Type Inference Cheat Sheet

Aggregate Functions

COUNT(*) → Int8, nullable: false
COUNT(col) → Int8, nullable: true
SUM(col) → <same>, nullable: true
AVG(col) → Float8, nullable: true
MIN(col) → <same>, nullable: true
MAX(col) → <same>, nullable: true

Scalar Functions

LENGTH(str) → Int8
UPPER/LOWER(s) → Text
CONCAT(...) → Text
ABS(n) → <same>
ROUND(n) → Float8
NOW() → Timestamp
CURRENT_DATE → Date

Arithmetic Type Coercion

Int2 + Int4 → Int4
Int4 + Int8 → Int8
Int4 + Float8 → Float8
Float4 + Int8 → Float8
Any + Numeric → Numeric

Binary Operators

=, <, >, <=, >=, != → Boolean (never nullable)
AND, OR → Boolean (never nullable)
+, -, *, /, % → Numeric (nullable if any arg nullable)
LIKE, NOT LIKE → Boolean (never nullable)
<-> (L2 distance) → Float8
-> (JSON get) → JSONB
->> (JSON text) → Text
@> (JSON contains) → Boolean

Special Cases

IS NULL / IS NOT NULL → Boolean (never nullable)
BETWEEN → Boolean (never nullable)
IN / NOT IN → Boolean (never nullable)
CAST → Target type (preserves nullability)
CASE (no ELSE) → Result type (nullable)
CASE (with ELSE) → Result type (nullable if any branch nullable)

Nullability Rules

Propagation

  • Binary expressions: nullable if either operand is nullable
  • Unary expressions: preserve operand nullability
  • Functions: nullable if any argument is nullable

Exceptions

  • Comparisons always return Boolean (never nullable)
  • COUNT(*) always returns Int8 (never nullable)
  • IS NULL / IS NOT NULL always return Boolean (never nullable)
  • BETWEEN / IN always return Boolean (never nullable)

Examples

Simple SELECT

SELECT id, price * quantity AS total FROM products;

Schema:

  • id: Int4, nullable from table schema
  • total: Float8 (Int4 * Float8 → Float8), nullable if either is nullable

Aggregates

SELECT category, COUNT(*), AVG(price) FROM products GROUP BY category;

Schema:

  • category: Text, nullable from table
  • count: Int8, not nullable (COUNT(*))
  • avg: Float8, nullable (AVG always nullable)

CASE Expression

SELECT CASE WHEN x > 0 THEN x END AS positive_x FROM t;

Schema:

  • positive_x: Int4, nullable (no ELSE = implicit NULL)

Testing

Run Tests

Terminal window
# All type inference tests
cargo test --lib sql::type_inference
# Specific categories
cargo test test_nullable_ # Nullability tests
cargo test test_to_column_ # Column creation tests
cargo test test_aggregate_ # Aggregate tests

Test Coverage

  • ✓ Column type lookup
  • ✓ Literal type inference
  • ✓ Aggregate functions (all types)
  • ✓ Arithmetic coercion
  • ✓ Nullability propagation
  • ✓ to_column() helper
  • ✓ Complex nested expressions
  • ✓ JSON operators
  • ✓ Vector operators
  • ✓ CASE expressions

Common Patterns

Pattern 1: Project with Type Inference

let columns = aliases.iter()
.zip(exprs.iter())
.map(|(alias, expr)| expr.to_column(alias.clone(), &input_schema))
.collect();

Pattern 2: Aggregate Schema

let mut columns = Vec::new();
for (i, expr) in group_by.iter().enumerate() {
columns.push(expr.to_column(format!("group_{}", i), &input_schema));
}
for (i, expr) in aggr_exprs.iter().enumerate() {
columns.push(expr.to_column(format!("agg_{}", i), &input_schema));
}

Pattern 3: Manual Type + Nullability

let data_type = expr.infer_type(&schema)?;
let nullable = expr.infer_nullable(&schema);
let column = Column {
name: "result".to_string(),
data_type,
nullable,
primary_key: false,
};

Migration from Old Code

Before

let data_type = expr.infer_type(&schema).unwrap_or(DataType::Text);
Column {
name: alias.clone(),
data_type,
nullable: true, // Always true!
primary_key: false,
}

After

expr.to_column(alias.clone(), &schema)

Benefits:

  • Proper type inference (no fallback to Text)
  • Accurate nullability
  • Less boilerplate
  • Centralized logic

Troubleshooting

Issue: Type inference fails

Solution: Expression might reference non-existent column

match expr.infer_type(&schema) {
Ok(t) => t,
Err(e) => {
eprintln!("Type inference failed: {}", e);
DataType::Text // Fallback
}
}

Issue: Nullability too conservative

Reason: NULL propagation is conservative by design (safe but not always optimal) Solution: Use NOT NULL constraints in table schema

Issue: Aggregate nullability unexpected

Check: COUNT(*) vs COUNT(col)

  • COUNT(*) → never nullable
  • COUNT(col) → nullable

Key Takeaways

  1. Use to_column() for most cases - it handles both type and nullability
  2. COUNT(*) special case - only aggregate that’s never nullable
  3. Comparisons never nullable - always return boolean
  4. CASE without ELSE - always nullable (implicit NULL)
  5. Type coercion - follows PostgreSQL rules (Int → Float → Numeric)

  • Full documentation: /home/claude/HeliosDB Nano/SCHEMA_DERIVATION_IMPLEMENTATION.md
  • Type inference module: /home/claude/HeliosDB Nano/src/sql/type_inference.rs
  • Logical plan: /home/claude/HeliosDB Nano/src/sql/logical_plan.rs
  • Types module: /home/claude/HeliosDB Nano/src/types.rs