Skip to content

PostgreSQL

🚨 IMPORTANT — Compute-Layer Client Protocol Compatibility

Non-negotiable goal: The compute server MUST accept connections from mainstream client ecosystems without custom drivers, starting with Python clients. It must mimic the wire protocols and minimal SQL surface of:

PostgreSQL (libpq wire) — Python: psycopg2, asyncpg, SQLAlchemy (pg dialect)

MySQL (Protocol v10) — Python: mysql-connector-python, PyMySQL, SQLAlchemy (mysql dialect)

Oracle/Tibero (Oracle Net/TTC subset for Thin) — Python: oracledb (Thin mode)

SQL Server (TDS 7.4 subset) — Python: pyodbc, pymssql

DB2 (DRDA subset) — Python: ibm_db

Snowflake (REST/HTTP JSON API subset) — Python: snowflake-connector-python

Databricks SQL (HTTP/Thrift subset) — Python: databricks-sql-connector

Pinecone (HTTP/gRPC API subset) — Python: pinecone-client

Rationale: zero-friction adoption. Users must be able to “point existing apps to a new host/port” and go.

Scope & Acceptance Criteria Must-have (Phase 1)

Protocol autodetect & routing at the compute ingress:

Detect libpq (PostgreSQL), MySQL v10, TDS 7.4 prelogin, Oracle Net connect packet, DRDA EXCSAT, HTTP(S) with Snowflake/Databricks/Pinecone paths, plus TLS ALPN hints.

Negotiate auth (see matrix) and spawn the correct protocol handler.

Minimal SQL surface per protocol sufficient for CRUD + prepared statements + server-side cursors (where applicable).

Datatype mappings covering numerics, text, timestamps, bytea/blob, decimals; plus VECTOR(N) exposed through each ecosystem (e.g., as BYTEA/VARBINARY or UDT until native support is added).

Transactional semantics: BEGIN/COMMIT/ROLLBACK; isolation at least READ COMMITTED; parameter binding; error codes mapped to each ecosystem’s conventions.

Python client interop works without code changes beyond DSN/host.

Nice-to-have (Phase 1.5)

SCRAM-SHA-256 for Postgres; caching_sha2_password for MySQL; TLS-first and ALPN for fast dispatch; HTTP/2 for Snowflake/Databricks/Pinecone surfaces.

Basic cursor pagination semantics consistent with each protocol.

Out of scope (initially)

Full proprietary extensions (e.g., Oracle advanced features, SQL Server MARS, DB2 packages). Provide Bronze-level compatibility (see Matrix).

Compatibility Levels (per ecosystem)

Gold: Native wire-protocol parity for connection, auth, simple queries, prepared statements, result sets, cursors, errors.

Silver: As Gold, plus bulk-load path and common DDL.

Bronze: Connect, auth, simple queries, basic types; some features stubbed.

Target for Phase 1:

PostgreSQL: Gold

MySQL: Gold

Snowflake/Databricks/Pinecone: Silver (HTTP API subsets)

SQL Server, DB2, Oracle/Tibero: Bronze (pragmatic subsets)

Ports & Endpoints (suggested defaults)

5432/tcp → PostgreSQL handler

3306/tcp → MySQL handler

1521/tcp → Oracle/Tibero handler (subset)

1433/tcp → TDS handler (subset)

50000/tcp → DRDA handler (subset)

443/tcp (TLS) → HTTP gateway:

/snowflake/* → Snowflake-compatible REST

/dbsql/* → Databricks SQL-compatible

/pinecone/* → Pinecone-compatible (index/query/vectors)

A unified autodetect port (e.g., 6543/tcp) is allowed, but dedicated ports reduce surprises.

Design Rules

Protocol gateway ≠ query engine. The gateway normalizes incoming SQL/requests into the internal AST/plan API; the storage tier remains unchanged.

Dialect adapter layer in the compute node translates vendor quirks (identifiers, limit/offset vs TOP, dual table, date literals, parameter styles) into the internal logical plan.

Error mapping: return ecosystem-native codes/messages (e.g., SQLSTATE for Postgres, MySQL error numbers, TDS errors).

Auth adapters: implement minimal, secure variants (SCRAM for PG, native MySQL, JWT/OAuth for HTTP APIs). Always support TLS.

No proprietary blobs stored at rest for compatibility—translation happens at the edge; internal engine stays canonical.

Definition of Done (DoD)

Connect/run SELECT 1 + CRUD from each Python client listed (see Test Matrix).

Prepared statements with parameters function in each driver.

Transactions work (commit/rollback) and are visible cluster-wide.

Vector search endpoint usable from Pinecone client; vector SQL usable from PG/MySQL clients.

Errors & datatypes round-trip as expected in each ecosystem’s conventions.

Docs include DSN examples and “expected differences” per ecosystem.

Risks & Mitigations

Licensing/IP: Avoid copying proprietary code/spec prose; implement from public specs/interop behavior. Expose subset features clearly.

Dialect drift: Keep translation small; steer users to ANSI SQL where possible.

Auth complexity: Start with password+TLS; add SSO later on HTTP surfaces.

Rollout Plan (opinionated)

Phase 1: PostgreSQL + MySQL (Gold) — hit 80% of workloads immediately.

Phase 1.5: Snowflake/Databricks/Pinecone (Silver) via HTTP gateway.

Phase 2: SQL Server/DB2 (Bronze) for basic connectivity; improve iteratively.

Phase 3: Oracle/Tibero (Bronze→Silver) as demand requires.

DSN Examples (copy/paste for CI)

PostgreSQL

import psycopg2 conn = psycopg2.connect(“host=db.example port=5432 dbname=demo user=demo password=demo sslmode=require”) cur = conn.cursor(); cur.execute(“select 1”); print(cur.fetchone())

MySQL

import pymysql conn = pymysql.connect(host=“db.example”, port=3306, user=“demo”, password=“demo”, database=“demo”, ssl={‘ssl’:{}}) cur = conn.cursor(); cur.execute(“select 1”); print(cur.fetchone())

Snowflake (HTTP subset)

import snowflake.connector as sf conn = sf.connect(account=“local”, user=“demo”, password=“demo”, host=“db.example”, protocol=“https”, port=443) cur = conn.cursor(); cur.execute(“select 1”); print(cur.fetchone())

Databricks SQL (HTTP subset)

from databricks import sql with sql.connect(server_hostname=“db.example”, http_path=“/dbsql”, access_token=“demo”) as c: with c.cursor() as cur: cur.execute(“select 1”); print(cur.fetchall())

Pinecone-like (HTTP)

from pinecone import Pinecone pc = Pinecone(api_key=“demo”, host=“https://db.example/pinecone”) ix = pc.Index(“demo”) ix.query(vector=[0.1]*384, top_k=5, filter={“category”:“books”})

SQL Server (TDS subset) via pyodbc

import pyodbc conn = pyodbc.connect(“DRIVER={ODBC Driver 18 for SQL Server};SERVER=db.example,1433;UID=demo;PWD=demo;Encrypt=yes;TrustServerCertificate=yes”) cur = conn.cursor(); cur.execute(“select 1”); print(cur.fetchone())

DB2 (DRDA subset)

import ibm_db conn = ibm_db.connect(“DATABASE=DEMO;HOSTNAME=db.example;PORT=50000;PROTOCOL=TCPIP;UID=demo;PWD=demo;SECURITY=SSL;”, "", "") stmt = ibm_db.exec_immediate(conn, “select 1 from sysibm.sysdummy1”) row = ibm_db.fetch_tuple(stmt); print(row)

Oracle/Tibero (Thin subset)

import oracledb conn = oracledb.connect(user=“demo”, password=“demo”, dsn=“db.example:1521/DEMO”, config_dir=None) cur = conn.cursor(); cur.execute(“select 1 from dual”); print(cur.fetchone())

Implementation Notes (for the engineer)

Protocol Autodetect: peek initial bytes; if TLS ClientHello, use ALPN (pgsql, mysql, tds, drda, or route to HTTP). Otherwise match magic/packet shapes (PG startup length+protocol, MySQL expects server handshake first, TDS prelogin, Oracle CONNECT_DATA, DRDA EXCSAT).

Handlers: keep handlers thin; normalize to internal AST and parameter model; return rows via a common rowset API; map errors at the edge.

Dialect: implement small rewriters (e.g., LIMIT/OFFSET↔TOP, DUAL, CURRENT_TIMESTAMP variants).

Security: default TLS; refuse cleartext creds unless explicitly allowed in dev.