Schema Generation API

Auto-generate database schemas from JSON samples - Perfect for AI agents and rapid prototyping

Overview

The Schema Generation API analyzes your JSON data samples and automatically generates:

SQL DDL statements ready for execution
Column types with smart inference (VARCHAR, INTEGER, VECTOR, JSONB, etc.)
Constraints (PRIMARY KEY, UNIQUE, NOT NULL)
Indexes recommendations
Confidence scores for each inference

Available on Free Tier - No credit card required.

Quick Start

1. Get Your Token

# Sign up for free
curl -X POST http://localhost:6543/auth/v1/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com", "password": "secure123"}'

# Get token
curl -X POST http://localhost:6543/auth/v1/token \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com", "password": "secure123"}'

2. Infer Schema from JSON

curl -X POST http://localhost:6543/api/v1/schema/infer \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "samples": [
      {
        "id": 1,
        "name": "Widget",
        "price": 29.99,
        "in_stock": true,
        "tags": ["electronics", "gadget"]
      },
      {
        "id": 2,
        "name": "Gadget Pro",
        "price": 49.99,
        "in_stock": false,
        "tags": ["electronics", "premium"]
      }
    ],
    "table_name": "products"
  }'

3. Get Production-Ready DDL

{
  "success": true,
  "data": {
    "table_name": "products",
    "columns": [
      {"name": "id", "sql_type": "INTEGER", "nullable": false, "confidence": 0.95},
      {"name": "name", "sql_type": "VARCHAR(20)", "nullable": false, "confidence": 0.95},
      {"name": "price", "sql_type": "NUMERIC", "nullable": false, "confidence": 0.95},
      {"name": "in_stock", "sql_type": "BOOLEAN", "nullable": false, "confidence": 0.95},
      {"name": "tags", "sql_type": "JSON", "nullable": false, "confidence": 0.90}
    ],
    "primary_key": ["id"],
    "ddl": "CREATE TABLE products (\n    id INTEGER NOT NULL,\n    name VARCHAR(20) NOT NULL,\n    price NUMERIC NOT NULL,\n    in_stock BOOLEAN NOT NULL,\n    tags JSON NOT NULL,\n    PRIMARY KEY (id)\n);",
    "confidence": 0.94,
    "warnings": []
  }
}

4. Execute the DDL

curl -X POST http://localhost:6543/api/v1/query \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"sql": "CREATE TABLE products (\n    id INTEGER NOT NULL,\n    name VARCHAR(20) NOT NULL,\n    price NUMERIC NOT NULL,\n    in_stock BOOLEAN NOT NULL,\n    tags JSON NOT NULL,\n    PRIMARY KEY (id)\n);"}'

API Endpoints

Infer Schema

POST /api/v1/schema/infer

Infer schema from JSON samples.

Request Body:

Field	Type	Required	Description
`samples`	array	Yes	Array of JSON objects (1-1000)
`table_name`	string	No	Table name (default: “inferred_table”)
`options`	object	No	Inference options

Options:

Option	Type	Default	Description
`detect_nullable`	boolean	true	Detect nullable columns
`detect_unique`	boolean	true	Detect unique columns
`detect_primary_key`	boolean	true	Suggest primary key
`detect_foreign_keys`	boolean	false	Detect foreign key patterns
`suggest_indexes`	boolean	true	Suggest indexes
`prefer_narrow_types`	boolean	true	Prefer INT over BIGINT
`max_varchar_length`	number	255	Max VARCHAR before TEXT
`detect_vectors`	boolean	true	Detect vector columns
`detect_json`	boolean	true	Detect JSON columns

Example with Options:

{
  "samples": [...],
  "table_name": "events",
  "options": {
    "prefer_narrow_types": false,
    "max_varchar_length": 500,
    "detect_vectors": true
  }
}

Batch Inference

POST /api/v1/schema/infer/batch

Infer schemas for multiple tables at once.

Request:

{
  "tables": [
    {
      "name": "users",
      "samples": [
        {"id": 1, "name": "Alice", "email": "alice@example.com"},
        {"id": 2, "name": "Bob", "email": "bob@example.com"}
      ]
    },
    {
      "name": "orders",
      "samples": [
        {"id": 1, "user_id": 1, "total": 99.99},
        {"id": 2, "user_id": 2, "total": 149.99}
      ]
    }
  ],
  "detect_relationships": true
}

Response:

{
  "success": true,
  "data": {
    "schemas": [
      {"table_name": "users", "ddl": "...", "confidence": 0.9},
      {"table_name": "orders", "ddl": "...", "confidence": 0.9}
    ],
    "relationships": [
      {
        "from_table": "orders",
        "from_column": "user_id",
        "to_table": "users",
        "to_column": "id",
        "relationship_type": "many-to-one",
        "confidence": 0.85
      }
    ],
    "combined_ddl": "CREATE TABLE users (...);\n\nCREATE TABLE orders (...);\n\nALTER TABLE orders ADD CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id);"
  }
}

Validate Schema

POST /api/v1/schema/validate

Validate DDL syntax and best practices.

{
  "ddl": "CREATE TABLE users (id INT, name TEXT)"
}

Response:

{
  "success": true,
  "data": {
    "valid": true,
    "errors": [],
    "warnings": [
      {"code": "MISSING_PK", "message": "No PRIMARY KEY defined"},
      {"code": "MISSING_NOT_NULL", "message": "Consider adding NOT NULL constraints"}
    ],
    "suggestions": [
      "Add PRIMARY KEY constraint to 'id' column",
      "Consider VARCHAR(n) instead of TEXT for bounded strings"
    ]
  }
}

Compare Schemas

POST /api/v1/schema/compare

Compare two schema versions and generate migration.

{
  "source": "CREATE TABLE users (id INT, name TEXT)",
  "target": "CREATE TABLE users (id INT, name VARCHAR(100), email TEXT)",
  "generate_migration": true
}

Response:

{
  "success": true,
  "data": {
    "differences": [
      {"diff_type": "column_modified", "object": "name", "source_state": "TEXT", "target_state": "VARCHAR(100)"},
      {"diff_type": "column_added", "object": "email", "target_state": "TEXT"}
    ],
    "forward_migration": "ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(100);\nALTER TABLE users ADD COLUMN email TEXT;",
    "backward_migration": "ALTER TABLE users DROP COLUMN email;\nALTER TABLE users ALTER COLUMN name TYPE TEXT;",
    "is_compatible": true
  }
}

Type Detection

Supported Types

JSON Value	Detected SQL Type
`true/false`	BOOLEAN
`123` (integer)	INTEGER or BIGINT
`123.45` (decimal)	NUMERIC
`"hello"` (short)	VARCHAR(n)
`"very long..."`	TEXT
`"2024-01-15"`	DATE*
`"2024-01-15T10:00:00Z"`	TIMESTAMP*
`[0.1, 0.2, 0.3]` (floats)	VECTOR(n)
`["a", "b"]` (mixed)	JSON
`{"key": "value"}`	JSONB
`null`	marks column nullable

*Date/timestamp detection based on pattern matching

Vector Detection

Arrays of numbers are detected as vectors:

{
  "samples": [
    {
      "id": 1,
      "content": "Hello world",
      "embedding": [0.1, 0.2, 0.3, 0.4, 0.5]
    }
  ]
}

Result:

CREATE TABLE documents (
    id INTEGER NOT NULL,
    content VARCHAR(22) NOT NULL,
    embedding VECTOR(5) NOT NULL,
    PRIMARY KEY (id)
);

For AI Agents

Structured Output for LLMs

Schema inference is designed for AI coding assistants:

# AI Agent workflow
import requests

def create_table_from_data(data_samples, table_name):
    """Let AI infer schema from example data."""

    # 1. Infer schema
    response = requests.post(
        "http://localhost:6543/api/v1/schema/infer",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={
            "samples": data_samples,
            "table_name": table_name
        }
    )
    schema = response.json()["data"]

    # 2. Review confidence
    if schema["confidence"] < 0.7:
        print(f"Low confidence: {schema['warnings']}")
        return None

    # 3. Execute DDL
    requests.post(
        "http://localhost:6543/api/v1/query",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={"sql": schema["ddl"]}
    )

    return schema

# Usage by AI agent
samples = [
    {"user_id": 1, "action": "click", "timestamp": "2024-01-15T10:00:00Z"},
    {"user_id": 2, "action": "purchase", "amount": 99.99}
]
create_table_from_data(samples, "user_events")

Agent-Friendly Response Format

{
  "success": true,
  "data": {
    "table_name": "user_events",
    "ddl": "...",
    "columns": [...],
    "confidence": 0.85,
    "warnings": [],

    "_agent_hints": {
      "ready_to_execute": true,
      "suggested_indexes": ["user_id"],
      "potential_issues": []
    }
  }
}

Free Tier Limits

Feature	Free Tier	Starter	Pro
Schema inferences/day	100	1,000	Unlimited
Samples per request	100	500	1,000
Batch tables	5	20	50
Relationship detection	Yes	Yes	Yes
Migration generation	Yes	Yes	Yes

Best Practices

1. Provide Diverse Samples

// Good: Shows variety
{
  "samples": [
    {"price": 29.99, "discount": null},
    {"price": 0.99, "discount": 0.10},
    {"price": 999.99, "discount": 0.50}
  ]
}

// Bad: Single sample
{
  "samples": [
    {"price": 29.99}
  ]
}

2. Include Edge Cases

{
  "samples": [
    {"name": "A", "tags": ["a"]},           // Short string, single tag
    {"name": "Very Long Product Name Here", "tags": ["a", "b", "c"]},  // Long string, multiple tags
    {"name": null, "tags": []}              // Null case
  ]
}

3. Use Meaningful Table Names

// Good
{"table_name": "user_sessions"}

// Bad
{"table_name": "table1"}

Examples

E-commerce Product Catalog

curl -X POST http://localhost:6543/api/v1/schema/infer \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "samples": [
      {
        "sku": "PROD-001",
        "name": "Wireless Headphones",
        "description": "High-quality wireless headphones with noise cancellation",
        "price": 149.99,
        "currency": "USD",
        "in_stock": true,
        "stock_count": 250,
        "categories": ["electronics", "audio"],
        "embedding": [0.1, 0.2, 0.3, 0.4, 0.5],
        "created_at": "2024-01-15T00:00:00Z"
      }
    ],
    "table_name": "products"
  }'

User Analytics Events

curl -X POST http://localhost:6543/api/v1/schema/infer \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "samples": [
      {
        "event_id": "evt_123abc",
        "user_id": "usr_456def",
        "event_type": "page_view",
        "properties": {"page": "/home", "referrer": "google.com"},
        "timestamp": "2024-01-15T10:30:00Z",
        "session_id": "sess_789ghi"
      }
    ],
    "table_name": "analytics_events"
  }'

RAG Document Store

curl -X POST http://localhost:6543/api/v1/schema/infer \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "samples": [
      {
        "doc_id": "doc-001",
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of artificial intelligence...",
        "embedding": [/* 1536 dimensions for OpenAI embeddings */],
        "metadata": {
          "source": "textbook",
          "chapter": 1,
          "author": "Dr. Smith"
        },
        "chunk_index": 0,
        "total_chunks": 15
      }
    ],
    "table_name": "documents",
    "options": {
      "detect_vectors": true
    }
  }'

Error Handling

Common Errors

Error	Cause	Solution
`EMPTY_SAMPLES`	No samples provided	Add at least 1 sample
`INVALID_JSON`	Malformed JSON	Validate JSON syntax
`TYPE_CONFLICT`	Inconsistent types	Ensure consistent types across samples
`TOO_MANY_SAMPLES`	Exceeded limit	Reduce to plan limit

Example Error Response

{
  "success": false,
  "error": {
    "code": "TYPE_CONFLICT",
    "message": "Column 'price' has conflicting types: INTEGER, TEXT",
    "details": {
      "column": "price",
      "sample_1_type": "INTEGER",
      "sample_3_type": "TEXT"
    }
  }
}

Next Steps

Questions? API Reference | Join Discord