Schema Generation API
Schema Generation API
Auto-generate database schemas from JSON samples - Perfect for AI agents and rapid prototyping
Overview
The Schema Generation API analyzes your JSON data samples and automatically generates:
- SQL DDL statements ready for execution
- Column types with smart inference (VARCHAR, INTEGER, VECTOR, JSONB, etc.)
- Constraints (PRIMARY KEY, UNIQUE, NOT NULL)
- Indexes recommendations
- Confidence scores for each inference
Available on Free Tier - No credit card required.
Quick Start
1. Get Your Token
# Sign up for freecurl -X POST http://localhost:6543/auth/v1/signup \ -H "Content-Type: application/json" \ -d '{"email": "dev@example.com", "password": "secure123"}'
# Get tokencurl -X POST http://localhost:6543/auth/v1/token \ -H "Content-Type: application/json" \ -d '{"email": "dev@example.com", "password": "secure123"}'2. Infer Schema from JSON
curl -X POST http://localhost:6543/api/v1/schema/infer \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "samples": [ { "id": 1, "name": "Widget", "price": 29.99, "in_stock": true, "tags": ["electronics", "gadget"] }, { "id": 2, "name": "Gadget Pro", "price": 49.99, "in_stock": false, "tags": ["electronics", "premium"] } ], "table_name": "products" }'3. Get Production-Ready DDL
{ "success": true, "data": { "table_name": "products", "columns": [ {"name": "id", "sql_type": "INTEGER", "nullable": false, "confidence": 0.95}, {"name": "name", "sql_type": "VARCHAR(20)", "nullable": false, "confidence": 0.95}, {"name": "price", "sql_type": "NUMERIC", "nullable": false, "confidence": 0.95}, {"name": "in_stock", "sql_type": "BOOLEAN", "nullable": false, "confidence": 0.95}, {"name": "tags", "sql_type": "JSON", "nullable": false, "confidence": 0.90} ], "primary_key": ["id"], "ddl": "CREATE TABLE products (\n id INTEGER NOT NULL,\n name VARCHAR(20) NOT NULL,\n price NUMERIC NOT NULL,\n in_stock BOOLEAN NOT NULL,\n tags JSON NOT NULL,\n PRIMARY KEY (id)\n);", "confidence": 0.94, "warnings": [] }}4. Execute the DDL
curl -X POST http://localhost:6543/api/v1/query \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{"sql": "CREATE TABLE products (\n id INTEGER NOT NULL,\n name VARCHAR(20) NOT NULL,\n price NUMERIC NOT NULL,\n in_stock BOOLEAN NOT NULL,\n tags JSON NOT NULL,\n PRIMARY KEY (id)\n);"}'API Endpoints
Infer Schema
POST /api/v1/schema/inferInfer schema from JSON samples.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
samples | array | Yes | Array of JSON objects (1-1000) |
table_name | string | No | Table name (default: “inferred_table”) |
options | object | No | Inference options |
Options:
| Option | Type | Default | Description |
|---|---|---|---|
detect_nullable | boolean | true | Detect nullable columns |
detect_unique | boolean | true | Detect unique columns |
detect_primary_key | boolean | true | Suggest primary key |
detect_foreign_keys | boolean | false | Detect foreign key patterns |
suggest_indexes | boolean | true | Suggest indexes |
prefer_narrow_types | boolean | true | Prefer INT over BIGINT |
max_varchar_length | number | 255 | Max VARCHAR before TEXT |
detect_vectors | boolean | true | Detect vector columns |
detect_json | boolean | true | Detect JSON columns |
Example with Options:
{ "samples": [...], "table_name": "events", "options": { "prefer_narrow_types": false, "max_varchar_length": 500, "detect_vectors": true }}Batch Inference
POST /api/v1/schema/infer/batchInfer schemas for multiple tables at once.
Request:
{ "tables": [ { "name": "users", "samples": [ {"id": 1, "name": "Alice", "email": "alice@example.com"}, {"id": 2, "name": "Bob", "email": "bob@example.com"} ] }, { "name": "orders", "samples": [ {"id": 1, "user_id": 1, "total": 99.99}, {"id": 2, "user_id": 2, "total": 149.99} ] } ], "detect_relationships": true}Response:
{ "success": true, "data": { "schemas": [ {"table_name": "users", "ddl": "...", "confidence": 0.9}, {"table_name": "orders", "ddl": "...", "confidence": 0.9} ], "relationships": [ { "from_table": "orders", "from_column": "user_id", "to_table": "users", "to_column": "id", "relationship_type": "many-to-one", "confidence": 0.85 } ], "combined_ddl": "CREATE TABLE users (...);\n\nCREATE TABLE orders (...);\n\nALTER TABLE orders ADD CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id);" }}Validate Schema
POST /api/v1/schema/validateValidate DDL syntax and best practices.
{ "ddl": "CREATE TABLE users (id INT, name TEXT)"}Response:
{ "success": true, "data": { "valid": true, "errors": [], "warnings": [ {"code": "MISSING_PK", "message": "No PRIMARY KEY defined"}, {"code": "MISSING_NOT_NULL", "message": "Consider adding NOT NULL constraints"} ], "suggestions": [ "Add PRIMARY KEY constraint to 'id' column", "Consider VARCHAR(n) instead of TEXT for bounded strings" ] }}Compare Schemas
POST /api/v1/schema/compareCompare two schema versions and generate migration.
{ "source": "CREATE TABLE users (id INT, name TEXT)", "target": "CREATE TABLE users (id INT, name VARCHAR(100), email TEXT)", "generate_migration": true}Response:
{ "success": true, "data": { "differences": [ {"diff_type": "column_modified", "object": "name", "source_state": "TEXT", "target_state": "VARCHAR(100)"}, {"diff_type": "column_added", "object": "email", "target_state": "TEXT"} ], "forward_migration": "ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(100);\nALTER TABLE users ADD COLUMN email TEXT;", "backward_migration": "ALTER TABLE users DROP COLUMN email;\nALTER TABLE users ALTER COLUMN name TYPE TEXT;", "is_compatible": true }}Type Detection
Supported Types
| JSON Value | Detected SQL Type |
|---|---|
true/false | BOOLEAN |
123 (integer) | INTEGER or BIGINT |
123.45 (decimal) | NUMERIC |
"hello" (short) | VARCHAR(n) |
"very long..." | TEXT |
"2024-01-15" | DATE* |
"2024-01-15T10:00:00Z" | TIMESTAMP* |
[0.1, 0.2, 0.3] (floats) | VECTOR(n) |
["a", "b"] (mixed) | JSON |
{"key": "value"} | JSONB |
null | marks column nullable |
*Date/timestamp detection based on pattern matching
Vector Detection
Arrays of numbers are detected as vectors:
{ "samples": [ { "id": 1, "content": "Hello world", "embedding": [0.1, 0.2, 0.3, 0.4, 0.5] } ]}Result:
CREATE TABLE documents ( id INTEGER NOT NULL, content VARCHAR(22) NOT NULL, embedding VECTOR(5) NOT NULL, PRIMARY KEY (id));For AI Agents
Structured Output for LLMs
Schema inference is designed for AI coding assistants:
# AI Agent workflowimport requests
def create_table_from_data(data_samples, table_name): """Let AI infer schema from example data."""
# 1. Infer schema response = requests.post( "http://localhost:6543/api/v1/schema/infer", headers={"Authorization": f"Bearer {TOKEN}"}, json={ "samples": data_samples, "table_name": table_name } ) schema = response.json()["data"]
# 2. Review confidence if schema["confidence"] < 0.7: print(f"Low confidence: {schema['warnings']}") return None
# 3. Execute DDL requests.post( "http://localhost:6543/api/v1/query", headers={"Authorization": f"Bearer {TOKEN}"}, json={"sql": schema["ddl"]} )
return schema
# Usage by AI agentsamples = [ {"user_id": 1, "action": "click", "timestamp": "2024-01-15T10:00:00Z"}, {"user_id": 2, "action": "purchase", "amount": 99.99}]create_table_from_data(samples, "user_events")Agent-Friendly Response Format
{ "success": true, "data": { "table_name": "user_events", "ddl": "...", "columns": [...], "confidence": 0.85, "warnings": [],
"_agent_hints": { "ready_to_execute": true, "suggested_indexes": ["user_id"], "potential_issues": [] } }}Free Tier Limits
| Feature | Free Tier | Starter | Pro |
|---|---|---|---|
| Schema inferences/day | 100 | 1,000 | Unlimited |
| Samples per request | 100 | 500 | 1,000 |
| Batch tables | 5 | 20 | 50 |
| Relationship detection | Yes | Yes | Yes |
| Migration generation | Yes | Yes | Yes |
Best Practices
1. Provide Diverse Samples
// Good: Shows variety{ "samples": [ {"price": 29.99, "discount": null}, {"price": 0.99, "discount": 0.10}, {"price": 999.99, "discount": 0.50} ]}
// Bad: Single sample{ "samples": [ {"price": 29.99} ]}2. Include Edge Cases
{ "samples": [ {"name": "A", "tags": ["a"]}, // Short string, single tag {"name": "Very Long Product Name Here", "tags": ["a", "b", "c"]}, // Long string, multiple tags {"name": null, "tags": []} // Null case ]}3. Use Meaningful Table Names
// Good{"table_name": "user_sessions"}
// Bad{"table_name": "table1"}Examples
E-commerce Product Catalog
curl -X POST http://localhost:6543/api/v1/schema/infer \ -H "Authorization: Bearer $TOKEN" \ -d '{ "samples": [ { "sku": "PROD-001", "name": "Wireless Headphones", "description": "High-quality wireless headphones with noise cancellation", "price": 149.99, "currency": "USD", "in_stock": true, "stock_count": 250, "categories": ["electronics", "audio"], "embedding": [0.1, 0.2, 0.3, 0.4, 0.5], "created_at": "2024-01-15T00:00:00Z" } ], "table_name": "products" }'User Analytics Events
curl -X POST http://localhost:6543/api/v1/schema/infer \ -H "Authorization: Bearer $TOKEN" \ -d '{ "samples": [ { "event_id": "evt_123abc", "user_id": "usr_456def", "event_type": "page_view", "properties": {"page": "/home", "referrer": "google.com"}, "timestamp": "2024-01-15T10:30:00Z", "session_id": "sess_789ghi" } ], "table_name": "analytics_events" }'RAG Document Store
curl -X POST http://localhost:6543/api/v1/schema/infer \ -H "Authorization: Bearer $TOKEN" \ -d '{ "samples": [ { "doc_id": "doc-001", "title": "Introduction to Machine Learning", "content": "Machine learning is a subset of artificial intelligence...", "embedding": [/* 1536 dimensions for OpenAI embeddings */], "metadata": { "source": "textbook", "chapter": 1, "author": "Dr. Smith" }, "chunk_index": 0, "total_chunks": 15 } ], "table_name": "documents", "options": { "detect_vectors": true } }'Error Handling
Common Errors
| Error | Cause | Solution |
|---|---|---|
EMPTY_SAMPLES | No samples provided | Add at least 1 sample |
INVALID_JSON | Malformed JSON | Validate JSON syntax |
TYPE_CONFLICT | Inconsistent types | Ensure consistent types across samples |
TOO_MANY_SAMPLES | Exceeded limit | Reduce to plan limit |
Example Error Response
{ "success": false, "error": { "code": "TYPE_CONFLICT", "message": "Column 'price' has conflicting types: INTEGER, TEXT", "details": { "column": "price", "sample_1_type": "INTEGER", "sample_3_type": "TEXT" } }}Next Steps
Questions? API Reference | Join Discord