Protocol Specifications Summary
Protocol Specifications Summary
HeliosDB Multi-Protocol Database Support Research
Researcher Agent Analysis Date: 2025-10-10 Status: Phase 1 Research Complete
Executive Summary
This document provides comprehensive protocol specifications for HeliosDB’s multi-protocol database support. Research covers wire protocols, authentication flows, packet formats, and client compatibility requirements for PostgreSQL, MySQL, SQL Server (TDS), DB2 (DRDA), Oracle/Tibero, Snowflake, Databricks, and Pinecone.
Key Finding: All target protocols have well-documented specifications with clear autodetection patterns. Implementation feasibility is HIGH for Phase 1 (PostgreSQL, MySQL, HTTP APIs) and MEDIUM for Phase 2 (TDS, DRDA, Oracle).
1. PostgreSQL Protocol (libpq) - GOLD Target
Wire Protocol Overview
- Protocol Version: 3.2 (PostgreSQL 18+)
- Transport: TCP/IP, Unix domain sockets
- Message-Based: Frontend/Backend message exchange
- Default Port: 5432
Packet Structure
Message Format:- Message Type (1 byte): Single character identifier- Length (4 bytes): Message length including length field, excluding type byte- Payload (variable): Message-specific dataAuthentication Flow (SCRAM-SHA-256)
Specification: RFC 7677 (SCRAM-SHA-256), RFC 5802 (SASL)
Flow Sequence:
-
Client → Server: Startup Message
- Protocol version (3.0)
- Parameters (user, database, etc.)
-
Server → Client: AuthenticationSASL
- List of supported SASL mechanisms
- Prioritized order (SCRAM-SHA-256-PLUS, SCRAM-SHA-256)
-
Client → Server: SASLInitialResponse
- Chosen mechanism
- SCRAM client-first-message
-
Server → Client: AuthenticationSASLContinue
- SCRAM server-first-message
-
Client → Server: SASLResponse
- SCRAM client-final-message
-
Server → Client: AuthenticationSASLFinal or AuthenticationOk
Important Notes:
- Server ignores username in SCRAM messages; uses startup message username
- Channel binding (SCRAM-SHA-256-PLUS) provides MITM protection with TLS
- Password stored as SCRAM-SHA-256 hash on server
TLS Integration
- ALPN Identifier: “postgresql” (RFC 7301, IANA registered)
- Required: ALPN extension mandatory for SSL connections (protocol confusion protection)
- Versions: TLS 1.2+ recommended
Key Message Types
- ‘Q’: Simple query
- ‘P’: Parse (prepared statement)
- ‘B’: Bind (parameters to prepared statement)
- ‘E’: Execute
- ‘D’: Describe
- ‘C’: Close
- ‘X’: Terminate
- ‘T’: Row description
- ‘D’: Data row
- ‘Z’: ReadyForQuery
Implementation Requirements
- Message framing with type + length prefix
- SCRAM-SHA-256 authentication handler
- TLS with ALPN support
- Prepared statement caching
- Extended query protocol support
- Cursor support (DECLARE, FETCH, CLOSE)
2. MySQL Protocol v10 - GOLD Target
Wire Protocol Overview
- Protocol Version: 10 (since MySQL 3.21.0)
- Transport: TCP/IP
- Handshake: Server-initiated
- Default Port: 3306
Packet Structure
Packet Format:- Payload Length (3 bytes): Packet size (excluding header)- Sequence ID (1 byte): Incremental packet counter- Payload (variable): Protocol-specific dataHandshake Flow (Protocol::HandshakeV10)
Server → Client (Initial Handshake):
- Protocol Version: 0x0a (1 byte)- Server Version: NULL-terminated string- Connection ID: 4 bytes- Auth Plugin Data Part 1: 8 bytes- Filler: 0x00 (1 byte)- Capability Flags (lower): 2 bytes- Character Set: 1 byte- Status Flags: 2 bytes- Capability Flags (upper): 2 bytes- Auth Plugin Data Length: 1 byte- Reserved: 10 bytes (0x00)- Auth Plugin Data Part 2: variable- Auth Plugin Name: NULL-terminated (if CLIENT_PLUGIN_AUTH)Client → Server (Handshake Response):
- Capability flags
- Max packet size
- Character set
- Username (NULL-terminated)
- Auth response
- Database name (if CLIENT_CONNECT_WITH_DB)
- Auth plugin name (if CLIENT_PLUGIN_AUTH)
Authentication Methods
caching_sha2_password (MySQL 8.0+ default):
- SHA-256 based authentication
- In-memory server-side cache for performance
- RSA key exchange for password transmission
- Supports Unix socket and shared-memory protocols
- Fast reauthentication for cached users
mysql_native_password (legacy):
- SHA-1 based (less secure)
- Still widely supported for compatibility
Authentication Flow:
- Server sends nonce in auth-plugin-data
- Client responds with hash(password) XOR hash(nonce + hash(hash(password)))
- For caching_sha2_password: RSA encryption if not cached + no TLS
Capability Flags (Critical)
- CLIENT_PROTOCOL_41: 4.1+ protocol support
- CLIENT_SECURE_CONNECTION: New authentication method
- CLIENT_PLUGIN_AUTH: Pluggable authentication
- CLIENT_CONNECT_WITH_DB: Database in handshake
- CLIENT_SSL: SSL support
- CLIENT_TRANSACTIONS: Transaction status tracking
- CLIENT_SESSION_TRACK: Session state tracking
- CLIENT_DEPRECATE_EOF: Deprecate EOF packets
Key Packet Types
- 0x00: OK_Packet
- 0xff: ERR_Packet
- 0xfe: EOF_Packet (deprecated with CLIENT_DEPRECATE_EOF)
- 0x01: AUTH_SWITCH_REQUEST
- 0x03: COM_QUERY
- 0x16: COM_STMT_PREPARE
- 0x17: COM_STMT_EXECUTE
- 0x19: COM_STMT_CLOSE
Implementation Requirements
- Server-first handshake (client connects, server sends handshake)
- caching_sha2_password support for MySQL 8.0+ compatibility
- Capability flag negotiation
- Prepared statement protocol (binary protocol)
- Character set handling (utf8mb4 default)
- Transaction status tracking
3. TDS 7.4 Protocol (SQL Server) - BRONZE Target
Wire Protocol Overview
- Protocol Versions: TDS 7.3, 7.4, 8.0
- Transport: TCP/IP
- Default Port: 1433
- Specification: [MS-TDS] (Microsoft Open Specifications)
Packet Structure
TDS Packet Header (8 bytes):- Type (1 byte): Packet type indicator * 0x12: PRELOGIN (client) * 0x04: TABULAR RESULT (server response to PRELOGIN) * 0x10: LOGIN7 * 0x01: SQL BATCH * 0x03: RPC- Status (1 byte): Packet status flags- Length (2 bytes): Packet length (header + data)- SPID (2 bytes): Server process ID- Packet ID (1 byte): Packet sequence number- Window (1 byte): Currently unused (0x00)PRELOGIN Flow (Connection Negotiation)
Structure:
- Packet Header (type 0x12)
- Token List (option pairs: type + offset + length)
- Token Data (actual values)
PRELOGIN Options:
- VERSION (UL_VERSION, US_SUBBUILD): Client/server version
- ENCRYPTION (B_FENCRYPTION): Encryption mode
- ENCRYPT_OFF (0x00): No encryption
- ENCRYPT_ON (0x01): Encryption required
- ENCRYPT_NOT_SUP (0x02): Not supported
- ENCRYPT_REQ (0x03): Encryption required
- ENCRYPT_CLIENT_CERT (0x80): Certificate-based auth
- INSTANCE (B_INSTVALIDITY): SQL Server instance name
- THREADID (UL_THREADID): Client thread ID
- MARS (B_MARS): Multiple Active Result Sets support
- TRACEID (GUID_CONNID, GUID_ActivityID): Connection tracking (TDS 7.4+)
- FEDAUTHREQUIRED (B_FEDAUTHREQUIRED): Federated auth
TLS Integration
- PRELOGIN wraps SSL/TLS handshake payload
- For TDS 7.2-7.4: SSL handshake in PRELOGIN packets
- After successful TLS negotiation: LOGIN7 over encrypted channel
- TDS 8.0: Mandatory encryption by default
LOGIN7 Authentication
LOGIN7 Structure:- Length (4 bytes)- TDS version (4 bytes)- Packet size (4 bytes)- Client version (4 bytes)- Client PID (4 bytes)- Connection ID (4 bytes)- Option flags 1 (1 byte)- Option flags 2 (1 byte)- Type flags (1 byte)- Option flags 3 (1 byte)- Client timezone (4 bytes)- Client LCID (4 bytes)- Offset/length pairs for: * Hostname * Username * Password (obfuscated) * App name * Server name * Library name * Language * DatabasePassword Obfuscation: XOR with 0xA5, then swap nibbles
Implementation Requirements
- PRELOGIN negotiation (encryption, version, MARS)
- LOGIN7 authentication
- TLS support (PRELOGIN-wrapped handshake)
- Basic SQL BATCH execution
- Error token parsing (SQLSTATE codes)
- Parameter binding (RPC procedures, sp_executesql)
4. DRDA Protocol (DB2) - BRONZE Target
Wire Protocol Overview
- Standard: Open Group DRDA v2
- Architecture: Distributed Data Management (DDM)
- Default Port: 50000
- Character Encoding: EBCDIC (unless negotiated)
Packet Structure
DDM Packet Format:- Length (2 bytes): Total length in bytes- Magic ID (1 byte): 0xD0 (fixed)- Format (1 byte): Formatting flags- Code Point (2 bytes): Command/reply identifier- Data (variable): Code point specific dataConnection Flow (EXCSAT → ACCSEC → ACCRDB)
1. EXCSAT (Exchange Server Attributes)
- Purpose: Initiate connection, exchange server info
- Code Point: 0x1041
- Key Objects:
- EXTNAM (0x115E): External name (28 bytes: 20-byte app ID + 8-byte process/thread ID)
- SRVCLSNM: Server class name
- SRVNAM: Server name
- SRVRLSLV: Server release level
Server Response: EXCSATRD
- Server attributes acknowledgment
- SRVCLSNM, SRVNAM, SRVRLSLV
2. ACCSEC (Access Security)
- Purpose: Negotiate authentication mechanism
- Code Point: 0x106D
- Key Objects:
- SECMEC (Security Mechanism):
- 0x0003: USRIDPWD (User ID + Password)
- 0x0004: USRIDONL (User ID only)
- 0x0007: USRENCPWD (Encrypted password)
- 0x0009: EUSRIDPWD (External user + password)
- 0x000B: DCE (Kerberos)
- SECMEC (Security Mechanism):
Server Response: ACCSECRD
- Selected security mechanism
- Security token (if applicable)
3. ACCRDB (Access Relational Database)
- Purpose: Connect to specific database
- Code Point: 0x2001
- Key Objects:
- RDBNAM: Database name (padded to 18 bytes)
- RDBACCCL: RDB access class
- PRDID: Product ID
- TYPDEFNAM: Type definition name (QTDSQLASC for ASCII, QTDSQLX for mixed)
Server Response: ACCRDBRM
- Access confirmation
- Product-specific information
Implementation Requirements
- DDM packet framing (length + 0xD0 magic + code point)
- EXCSAT/EXCSATRD exchange
- ACCSEC/ACCSECRD authentication (USRIDPWD minimum)
- ACCRDB/ACCRDBRM database access
- EBCDIC ↔ UTF-8 conversion (unless ASCII negotiated)
- Basic SQL execution (PRPSQLSTT, OPNQRY, CNTQRY, CLSQRY)
- Error handling (SQLCARD response)
5. Oracle Net/TTC Protocol - BRONZE Target
Wire Protocol Overview
- Protocol: TNS (Transparent Network Substrate) + TTC (Two-Task Common)
- Default Port: 1521
- Architecture: Layered (TNS → TTC → SQL execution)
- Byte Order: Big-endian
TNS Packet Structure
TNS Header (8 bytes):- Packet Length (2 bytes): Total length including header- Packet Checksum (2 bytes): Typically 0x0000 (disabled)- Packet Type (1 byte): * 0x01: CONNECT * 0x02: ACCEPT * 0x03: ACK * 0x04: REFUSE * 0x05: REDIRECT * 0x06: DATA * 0x07: NULL * 0x09: ABORT * 0x0B: RESEND * 0x0C: MARKER * 0x0D: ATTENTION * 0x0E: CONTROL- Reserved (1 byte): 0x00- Header Checksum (2 bytes): Typically 0x0000CONNECT Packet
TNS CONNECT Structure:- TNS Header (type 0x01)- Version (2 bytes): Protocol version (0x0138 = 312)- Version Compatible (2 bytes): Min compatible version- Service Options (2 bytes): Service flags- Session Data Unit (2 bytes): Max SDU size- Max Transmission Unit (2 bytes): Max TDU size- NT Protocol Characteristics (2 bytes)- Line Turnaround (2 bytes): 0x0000- Value of 1 in Hardware (2 bytes): Byte order test (0x0001)- Data Length (2 bytes): Connect data length- Data Offset (2 bytes): Connect data offset- Max Receivable Connect Data (4 bytes)- Connect Flags 0 (1 byte)- Connect Flags 1 (1 byte)- Trace Cross Facility Items (variable)- Connect Data (variable): (DESCRIPTION=...) stringConnect Data Format (ASCII):
(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=DEMO)(CID=(PROGRAM=python)(HOST=client)(USER=demo)))(ADDRESS=(PROTOCOL=TCP)(HOST=server)(PORT=1521)))TTC (Two-Task Common) Protocol
DATA Packet (type 0x06) Structure:
- TNS Header- Data Flags (2 bytes)- Data Type (1 byte): * 0x01: Protocol negotiation * 0x02: Data type negotiation * 0x03: TTI function call- TTC Payload (variable)Protocol Negotiation (Data Type 0x01):
- Client sends supported versions: [6, 5, 4, 3, 2, 1, 0]
- Server responds with common version
- Character set information exchange
Data Type Negotiation (Data Type 0x02):
- Internal data representation exchange
- Endianness, date format, number format
TTI Function Calls (Data Type 0x03):
- TTI opcode (1 byte): Function identifier
- 0x03: OSQL (SQL execution)
- 0x05: OALL7/OALL8 (bundled execution)
- 0x11: OAUTH (authentication)
- 0x68: O3LOGON (3-phase logon)
- Function-specific parameters
Authentication Flow
- Client → CONNECT packet with connect data
- Server → ACCEPT or REFUSE
- Client → DATA (protocol negotiation)
- Server → DATA (protocol ack)
- Client → DATA (TTI O3LOGON phase 1)
- Server → DATA (auth challenge)
- Client → DATA (TTI O3LOGON phase 2 with credentials)
- Server → DATA (auth result)
Implementation Requirements
- TNS packet framing (length + checksum + type)
- CONNECT/ACCEPT handshake
- Connect data parsing (DESCRIPTION format)
- Protocol and data type negotiation
- TTC layer implementation
- TTI function calls (minimum: OSQL, O3LOGON)
- Basic authentication (password-based)
- “SELECT 1 FROM DUAL” support
- Character set handling (AL32UTF8)
6. Snowflake REST API - SILVER Target
API Overview
- Protocol: HTTPS/REST
- Format: JSON
- Default Port: 443
- Specification: OpenAPI compliant
- Repository: snowflake-rest-api-specs (GitHub)
Authentication Methods
1. Key Pair Authentication (JWT)
JWT Token Generation:- Header: { "alg": "RS256", "typ": "JWT" }- Payload: { "iss": "account.user.fingerprint", "sub": "account.user", "iat": <issued_at_timestamp>, "exp": <expiration_timestamp> }- Signature: RS256(header.payload, private_key)Authorization Header:
Authorization: Bearer <jwt_token>X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT2. OAuth Token
Authorization: Bearer <oauth_access_token>3. Password-based (for compatibility)
{ "username": "demo", "password": "demo", "account": "local"}SQL API Endpoints
Submit Statement (POST /api/v2/statements)
Request:{ "statement": "SELECT * FROM table WHERE id = ?", "timeout": 60, "database": "demo", "schema": "public", "warehouse": "compute_wh", "role": "accountadmin", "bindings": { "1": {"type": "FIXED", "value": "123"} }, "parameters": { "MULTI_STATEMENT_COUNT": "0" }}
Response (synchronous):{ "statementHandle": "uuid", "statementStatusUrl": "/api/v2/statements/uuid", "resultSetMetaData": { "numRows": 10, "rowType": [ {"name": "ID", "type": "FIXED", "nullable": false}, {"name": "NAME", "type": "TEXT", "nullable": true} ] }, "data": [ ["1", "Alice"], ["2", "Bob"] ], "code": "090001", "message": "Statement executed successfully."}
Response (asynchronous):{ "statementHandle": "uuid", "statementStatusUrl": "/api/v2/statements/uuid", "message": "Statement execution in progress."}Check Statement Status (GET /api/v2/statements/{statementHandle})
Response:{ "statementHandle": "uuid", "statementStatusUrl": "/api/v2/statements/uuid", "status": "success|running|failed", "resultSetMetaData": {...}, "data": [...], "stats": { "numRowsInserted": 0, "numRowsUpdated": 0, "numRowsDeleted": 0, "numDuplicateRowsUpdated": 0 }}Cancel Statement (POST /api/v2/statements/{statementHandle}/cancel)
Required Headers
- Authorization: Bearer token
- Content-Type: application/json
- Accept: application/json
- User-Agent: Application identifier
- X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT | OAUTH
Implementation Requirements
- HTTPS/TLS 1.2+ only
- JWT generation with RS256
- Asynchronous statement execution handling
- Statement handle tracking
- Result set pagination
- Session parameter management
- Error code mapping (Snowflake → generic)
7. Databricks SQL API - SILVER Target
API Overview
- Protocol: HTTP/Thrift over TLS
- Format: JSON (HTTP API), Thrift binary (driver protocol)
- Default Port: 443
- Driver: Thrift-based (no ODBC/JDBC dependency)
Authentication
Personal Access Token (PAT)
Authorization: Bearer <databricks_pat>OAuth Access Token (Azure)
Authorization: Bearer <oauth_token>Microsoft Entra ID Token (Azure)
Authorization: Bearer <entra_id_token>Statement Execution API
Submit Statement (POST /api/2.0/sql/statements)
Request:{ "warehouse_id": "abc123", "statement": "SELECT * FROM table WHERE id = :id", "parameters": [ {"name": "id", "value": "123", "type": "STRING"} ], "catalog": "main", "schema": "default", "disposition": "INLINE|EXTERNAL_LINKS", "format": "JSON_ARRAY|ARROW_STREAM", "wait_timeout": "30s", "on_wait_timeout": "CONTINUE|CANCEL"}
Response (quick execution):{ "statement_id": "uuid", "status": { "state": "SUCCEEDED|RUNNING|FAILED" }, "manifest": { "schema": { "columns": [ {"name": "id", "type_name": "INT", "position": 0}, {"name": "name", "type_name": "STRING", "position": 1} ] }, "total_row_count": 100, "total_chunk_count": 1, "chunks": [ { "chunk_index": 0, "row_count": 100, "byte_count": 5000 } ] }, "result": { "data_array": [ [123, "Alice"], [124, "Bob"] ] }}
Response (async execution):{ "statement_id": "uuid", "status": { "state": "RUNNING" }}Get Statement Status (GET /api/2.0/sql/statements/{statement_id})
Cancel Statement (POST /api/2.0/sql/statements/{statement_id}/cancel)
Thrift Protocol (Driver Internal)
- TCLIService.Client: Thrift service client
- Methods:
- OpenSession
- ExecuteStatement
- FetchResults
- CloseSession
- GetResultSetMetadata
- Format: Thrift binary protocol over TLS
Implementation Requirements
- HTTPS/TLS 1.2+ mandatory
- PAT or OAuth authentication
- Warehouse ID resolution
- Async statement execution
- Result chunking and pagination
- Arrow format support (optional)
- Thrift protocol for driver compatibility
- Unity Catalog awareness (catalog.schema.table)
8. Pinecone Vector API - SILVER Target
API Overview
- Protocols: HTTP/REST, gRPC
- Format: JSON (HTTP), Protocol Buffers (gRPC)
- Default Port: 443 (HTTPS), various (gRPC)
- Specification: 2025-01 API version
Authentication
Api-Key: <pinecone_api_key>Data Plane API (2025-01)
Upsert Vectors (POST /vectors/upsert)
Request:{ "vectors": [ { "id": "vec1", "values": [0.1, 0.2, 0.3, ..., 0.768], "metadata": { "category": "books", "title": "Example" }, "sparse_values": { "indices": [10, 45, 123], "values": [0.5, 0.3, 0.8] } } ], "namespace": "default"}
Response:{ "upsertedCount": 1}Query Vectors (POST /query)
Request:{ "vector": [0.1, 0.2, 0.3, ..., 0.768], "topK": 10, "namespace": "default", "filter": { "category": {"$eq": "books"}, "year": {"$gte": 2020} }, "includeValues": true, "includeMetadata": true}
Response:{ "matches": [ { "id": "vec1", "score": 0.95, "values": [...], "metadata": {...} } ], "namespace": "default"}Delete Vectors (POST /vectors/delete)
Request:{ "ids": ["vec1", "vec2"], "deleteAll": false, "namespace": "default", "filter": {"category": {"$eq": "books"}}}Fetch Vectors (GET /vectors/fetch)
Query params: ?ids=vec1&ids=vec2&namespace=defaultUpdate Vectors (POST /vectors/update)
{ "id": "vec1", "values": [...], "setMetadata": {"updated": true}, "namespace": "default"}Metadata Filtering
Supported Operators:
- $eq: Equals
- $ne: Not equals
- $gt, $gte: Greater than (or equal)
- $lt, $lte: Less than (or equal)
- $in: In array
- $nin: Not in array
Complex Filters:
{ "$and": [ {"category": {"$eq": "books"}}, {"year": {"$gte": 2020}}, {"$or": [ {"author": {"$eq": "Smith"}}, {"author": {"$eq": "Jones"}} ]} ]}gRPC Protocol
- Package:
pinecone.grpc - Install:
pip install "pinecone[grpc]" - Performance: Modest improvement for upsert/query operations
- Methods:
- Upsert
- Query
- Delete
- Fetch
- Update
- DescribeIndexStats
Index Operations (Control Plane)
- Create Index (POST /indexes)
- List Indexes (GET /indexes)
- Describe Index (GET /indexes/{name})
- Delete Index (DELETE /indexes/{name})
Implementation Requirements
- HTTPS/TLS for HTTP API
- gRPC with TLS for gRPC API
- API key authentication (header-based)
- Namespace isolation
- Metadata filtering (complex boolean expressions)
- Sparse vector support
- Index management (create, list, describe, delete)
- Vector dimension validation
- Up-to-the-second consistency (upsert/query reflect immediately)
Protocol Autodetection Summary
Detection Strategy Matrix
| Protocol | Detection Method | First Packet Signature | Port | TLS/ALPN |
|---|---|---|---|---|
| PostgreSQL | Startup length + version | Length(4) + 0x00030000 | 5432 | ALPN: “postgresql” |
| MySQL | Server sends first | Server sends HandshakeV10 (0x0a) | 3306 | No ALPN |
| TDS | PRELOGIN packet | Type 0x12 + PRELOGIN structure | 1433 | No specific ALPN |
| DRDA | EXCSAT code point | Length + 0xD0 + 0x1041 (EXCSAT) | 50000 | No ALPN |
| Oracle TNS | CONNECT packet | Type 0x01 + version 0x0138 | 1521 | No ALPN |
| HTTP/HTTPS | HTTP request line | ”GET”, “POST”, “PUT” or TLS ClientHello | 443 | ALPN: “h2”, “http/1.1” |
Implementation Pseudocode
def detect_protocol(socket): # Step 1: Check for TLS ClientHello peek = socket.peek(16)
if is_tls_client_hello(peek): # Wrap in TLS and check ALPN tls_socket = wrap_tls(socket) alpn = tls_socket.selected_alpn_protocol()
if alpn == "postgresql": return PostgreSQLHandler(tls_socket) elif alpn in ["h2", "http/1.1"]: return HTTPHandler(tls_socket) else: # Continue detection after TLS handshake peek = tls_socket.peek(16)
# Step 2: Protocol-specific detection
# PostgreSQL: 4-byte length + 4-byte protocol version if len(peek) >= 8: length = struct.unpack("!I", peek[0:4])[0] protocol = struct.unpack("!I", peek[4:8])[0] if protocol == 0x00030000 or protocol == 0x00020000: return PostgreSQLHandler(socket)
# TDS: Type byte 0x12 (PRELOGIN) if peek[0] == 0x12: return TDSHandler(socket)
# DRDA: Magic byte 0xD0 if len(peek) >= 3 and peek[2] == 0xD0: code_point = struct.unpack("!H", peek[4:6])[0] if code_point == 0x1041: # EXCSAT return DRDAHandler(socket)
# Oracle TNS: Type 0x01 (CONNECT) if len(peek) >= 8: tns_type = peek[4] if tns_type == 0x01: return OracleTNSHandler(socket)
# HTTP: Plain text "GET", "POST", etc. if peek.startswith(b"GET ") or peek.startswith(b"POST "): return HTTPHandler(socket)
# MySQL: Server sends HandshakeV10 first # Try sending server handshake and see response return MySQLHandler(socket) # Default to MySQL (server-initiated)Magic Bytes Reference
PostgreSQL Startup: 00 00 00 28 [Length: 40 bytes] 00 03 00 00 [Protocol: 3.0.0] 75 73 65 72 [user\0...]
MySQL HandshakeV10 (Server → Client): 1a 00 00 00 [Payload length: 26] 00 [Sequence: 0] 0a [Protocol version: 10] 38 2e 30 [8.0...]
TDS PRELOGIN (Client → Server): 12 [Type: PRELOGIN] 01 [Status: Normal] 00 34 [Length: 52] 00 00 [SPID: 0] 01 [Packet: 1] 00 [Window: 0]
DRDA EXCSAT: 00 2a [Length: 42] d0 [Magic: 0xD0] 01 [Format] 10 41 [Code point: EXCSAT 0x1041]
Oracle TNS CONNECT: 00 f4 [Length: 244] 00 00 [Checksum: disabled] 01 [Type: CONNECT] 00 [Reserved] 00 00 [Header checksum] 01 38 [Version: 312]
HTTP/1.1: 47 45 54 20 [GET ] 50 4f 53 54 [POST]
TLS ClientHello: 16 [Content type: Handshake] 03 01 [Version: TLS 1.0] ... 00 10 [ALPN extension]Python Client Driver Compatibility Matrix
PostgreSQL Drivers
psycopg2 (sync)
- Type: libpq wrapper (C extension)
- Protocol: PostgreSQL wire protocol
- Connection Pool: External (psycopg2.pool)
- Prepared Statements: Server-side via
cursor.execute()with named cursors - Parameter Style:
%s(format),%(name)s(pyformat) - Async Support: No (use psycopg3 or asyncpg)
- Key Behaviors:
- Automatic type conversion (PostgreSQL → Python)
- Server-side cursors with
cursor(name='...') - COPY protocol support
- Two-phase commit support
asyncpg (async)
- Type: Pure Python + Cython (protocol implementation)
- Protocol: PostgreSQL wire protocol (native)
- Connection Pool: Built-in (
asyncpg.create_pool()) - Prepared Statements: Automatic LRU cache for
fetch*()methods - Parameter Style:
$1, $2, $3(PostgreSQL native) - Async Support: asyncio native
- Key Behaviors:
- High performance (no libpq overhead)
- Automatic prepared statement caching
- Type codec system (custom types)
- Does NOT work with PgBouncer in transaction/statement mode
- Binary protocol for efficiency
- Cursors and prepared statements invalid after connection release
Critical Differences:
# psycopg2cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# asyncpgawait connection.fetch("SELECT * FROM users WHERE id = $1", user_id)MySQL Drivers
PyMySQL (sync)
- Type: Pure Python
- Protocol: MySQL wire protocol (Python implementation)
- Connection Pool: External (DBUtils, SQLAlchemy)
- Prepared Statements: Client-side emulation (string escaping)
- Parameter Style:
%s(format) - Async Support: No
- Key Behaviors:
- Pure Python (no C dependencies)
- Client-side parameter escaping (not true prepared statements)
- Character set handling (utf8mb4)
- Cursor classes (Cursor, SSCursor, DictCursor)
mysql-connector-python (sync)
- Type: Pure Python OR C extension
- Protocol: MySQL wire protocol
- Connection Pool: Built-in (
pooling.MySQLConnectionPool) - Prepared Statements: Server-side via
cursor(prepared=True) - Parameter Style:
- Regular:
%sor%(name)s - Prepared:
%sor?only
- Regular:
- Async Support: No (sync only)
- Key Behaviors:
- Official Oracle driver
- True server-side prepared statements (MySQLCursorPrepared)
- Automatic statement caching
- Prepared statement limits: no named parameters
- Binary protocol for prepared statements
- Character set negotiation
Critical Differences:
# PyMySQL (client-side escaping)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# mysql-connector-python (server-side prepared)cursor = conn.cursor(prepared=True)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))# Statement prepared on first execute, cached for subsequent callsSQL Server Drivers
pyodbc
- Type: ODBC wrapper (C extension)
- Protocol: TDS via ODBC driver
- Connection Pool: External (SQLAlchemy)
- Prepared Statements: Via ODBC (driver-dependent)
- Parameter Style:
?(qmark) - Key Behaviors:
- Requires ODBC driver installation (ODBC Driver 18 for SQL Server)
- Supports multiple database backends (not just SQL Server)
- Fast execution for parameterized queries
- Encryption settings in connection string
pymssql
- Type: FreeTDS wrapper (C extension)
- Protocol: TDS via FreeTDS
- Connection Pool: External
- Prepared Statements: Via FreeTDS
- Parameter Style:
%s(format) or%(name)s(pyformat) - Key Behaviors:
- Uses FreeTDS library
- Simpler API than pyodbc
- Named parameters supported
DB2 Drivers
ibm_db
- Type: IBM Data Server Driver wrapper (C extension)
- Protocol: DRDA via IBM driver
- Connection Pool: Manual
- Prepared Statements:
ibm_db.prepare()+ibm_db.execute() - Parameter Style:
?(qmark) - Key Behaviors:
- Requires IBM Data Server Driver installation
- Two APIs: ibm_db (native), ibm_db_dbi (DB-API 2.0)
- SSL/TLS support
- “SELECT 1 FROM sysibm.sysdummy1” for connection test
Oracle Drivers
oracledb (Thin mode)
- Type: Pure Python (Thin) OR C extension (Thick)
- Protocol: Oracle Net/TTC (Thin mode: Python implementation)
- Connection Pool: Built-in (
oracledb.create_pool()) - Prepared Statements: Automatic via
cursor.execute() - Parameter Style:
:name(named) or:1, :2(numeric) - Key Behaviors:
- Thin mode: no Oracle Client installation required
- Thick mode: uses Oracle Client libraries
- “SELECT 1 FROM DUAL” for connection test
- Bind variables with
:variablesyntax - Advanced features: LOBs, object types, collections
HTTP API Clients
snowflake-connector-python
- Type: Pure Python
- Protocol: HTTPS/REST + Arrow (optional)
- Connection Pool: Built-in
- Prepared Statements: Via
execute()with bindings - Parameter Style:
?(qmark) or:name(named) - Key Behaviors:
- JWT or password authentication
- Result caching
- Arrow format for performance
- Session parameter management
- Multi-statement execution
databricks-sql-connector
- Type: Thrift-based (no ODBC/JDBC)
- Protocol: HTTP/Thrift over TLS
- Connection Pool: Built-in
- Prepared Statements: Via
execute()with parameters - Parameter Style:
:name(named) - Key Behaviors:
- PAT or OAuth authentication
- Async execution support
- Result chunking
- Unity Catalog support (catalog.schema.table)
- Thrift TCLIService client
pinecone-client
- Type: Pure Python
- Protocol: HTTP/REST or gRPC
- Connection Pool: requests library (HTTP) or grpcio (gRPC)
- API Style: Vector operations (upsert, query, delete, fetch, update)
- Key Behaviors:
- API key authentication
- Namespace isolation
- Metadata filtering
- Sparse vector support
- gRPC for performance (optional)
Implementation Priority & Recommendations
Phase 1: PostgreSQL + MySQL (GOLD)
Timeline: 4-6 weeks
PostgreSQL:
- Well-documented wire protocol (v3.2)
- SCRAM-SHA-256 specification (RFC 7677)
- TLS ALPN support (“postgresql”)
- Extended query protocol for prepared statements
- Rich client ecosystem (psycopg2, asyncpg)
- ⚠ Parameter style differences (psycopg2: %s, asyncpg: $1)
MySQL:
- Clear HandshakeV10 specification
- Server-first handshake (easy autodetection)
- caching_sha2_password documented
- Prepared statement protocol (binary)
- ⚠ Client-side vs server-side prepared statements (PyMySQL vs mysql-connector-python)
- ⚠ Capability flag negotiation critical
Recommended Implementation:
- Implement PostgreSQL SCRAM-SHA-256 first (more complex auth)
- Implement MySQL caching_sha2_password
- Build unified parameter binding layer ($1/? → internal format)
- Create dialect adapters (LIMIT/OFFSET, identifier quoting)
- Test with both sync and async clients
Phase 1.5: HTTP APIs (SILVER)
Timeline: 3-4 weeks
Snowflake REST:
- OpenAPI specification available
- JWT authentication well-documented
- Async execution model
- ⚠ Result set pagination handling
Databricks SQL:
- Thrift protocol (established)
- Statement execution API clear
- OAuth/PAT authentication
- ⚠ Unity Catalog namespace handling
Pinecone:
- Simple REST API (2025-01)
- gRPC specification available
- Metadata filtering well-defined
- ⚠ Vector dimension validation
- ⚠ SQL translation for vector operations
Recommended Implementation:
- HTTP gateway with path-based routing (/snowflake/, /dbsql/, /pinecone/*)
- JWT/OAuth token validation
- Async statement execution handling
- Vector operation translation (SQL → Pinecone API)
Phase 2: SQL Server + DB2 (BRONZE)
Timeline: 4-5 weeks
TDS 7.4:
- Microsoft open specification [MS-TDS]
- PRELOGIN negotiation documented
- LOGIN7 structure clear
- ⚠ TLS wrapping in PRELOGIN packets (complex)
- ⚠ Password obfuscation (XOR + nibble swap)
DRDA:
- Open Group standard (freely available)
- EXCSAT/ACCSEC/ACCRDB flow documented
- ⚠ EBCDIC encoding handling
- ⚠ Limited client testing (ibm_db dependency)
Recommended Implementation:
- Focus on basic connectivity + simple queries
- PRELOGIN/LOGIN7 for TDS
- EXCSAT/ACCSEC/ACCRDB for DRDA
- Minimal feature set (no cursors, advanced features)
Phase 3: Oracle/Tibero (BRONZE → SILVER)
Timeline: 5-6 weeks
Oracle Net/TTC:
- ⚠ Reverse-engineered specifications (community docs)
- ⚠ Complex layered protocol (TNS → TTC → TTI)
- ⚠ Connect data parsing ((DESCRIPTION=…))
- ⚠ Limited official documentation
- oracledb Thin mode provides Python reference
Recommended Implementation:
- Start with TNS CONNECT/ACCEPT
- Basic TTC protocol negotiation
- Simple TTI function calls (OSQL)
- “SELECT 1 FROM DUAL” support
- Expand based on demand
Research Findings Summary
Key Insights
-
PostgreSQL has the most robust specification with IANA-registered ALPN, RFC-based authentication, and comprehensive protocol documentation. GOLD compatibility is highly achievable.
-
MySQL’s server-first handshake makes autodetection straightforward. The distinction between client-side (PyMySQL) and server-side (mysql-connector-python) prepared statements requires careful handling.
-
TDS and DRDA have open specifications (Microsoft Open Specs, Open Group), making Bronze-level compatibility feasible without reverse engineering.
-
Oracle Net/TTC relies on community documentation, making it the highest-risk protocol. Thin mode in oracledb provides a Python reference implementation.
-
HTTP APIs are well-specified and suitable for Silver compatibility. JWT/OAuth authentication is standard across Snowflake, Databricks, and Pinecone.
-
Protocol autodetection is feasible using magic bytes and TLS ALPN, with PostgreSQL and MySQL being the easiest to detect.
Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| SCRAM-SHA-256 implementation complexity | Medium | Use existing crypto libraries; RFC 5802/7677 compliance |
| asyncpg parameter style ($1) vs psycopg2 (%s) | Medium | Unified parameter binding layer with dialect translation |
| TLS-wrapped protocols (TDS PRELOGIN) | Medium | TLS peek + unwrap; handle PRELOGIN as TLS payload |
| EBCDIC encoding (DRDA) | Low | Use codecs library; negotiate ASCII mode early |
| Oracle TNS complexity (layered protocol) | High | Limit to Bronze; use oracledb Thin as reference |
| Client driver behavioral differences | Medium | Comprehensive integration testing with each driver |
Testing Requirements
Must-Have Tests (per protocol):
- Connection establishment
- Authentication (all supported methods)
- Simple query execution (SELECT 1)
- Prepared statement with parameters
- Transaction (BEGIN/COMMIT/ROLLBACK)
- Cursor operations (where applicable)
- Error handling (invalid query, auth failure)
- Data type round-trip (INT, TEXT, TIMESTAMP, DECIMAL, BYTEA/BLOB)
- Vector operations (Pinecone, PostgreSQL pgvector extension)
Python Client Matrix Testing:
- PostgreSQL: psycopg2, asyncpg, SQLAlchemy
- MySQL: PyMySQL, mysql-connector-python, SQLAlchemy
- SQL Server: pyodbc, pymssql
- DB2: ibm_db
- Oracle: oracledb (Thin mode)
- Snowflake: snowflake-connector-python
- Databricks: databricks-sql-connector
- Pinecone: pinecone-client (HTTP + gRPC)
Next Steps
- Architecture Design: Coordinate with architect agent on protocol router, handler interfaces, and dialect adapter architecture
- Implementation Plan: Coordinate with coder agent on module structure, dependencies, and code organization
- Testing Strategy: Define integration test suite with CI/CD pipeline
- Documentation: Create DSN examples, compatibility matrix, and “expected differences” documentation
References
Official Specifications
- PostgreSQL: https://www.postgresql.org/docs/current/protocol.html
- MySQL: https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol.html
- TDS: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/
- DRDA: https://pubs.opengroup.org/onlinepubs/009608699/ (Open Group)
- Oracle: Community-documented (TNS/TTC reverse engineering)
- Snowflake: https://docs.snowflake.com/en/developer-guide/sql-api/
- Databricks: https://docs.databricks.com/api/workspace/statementexecution
- Pinecone: https://docs.pinecone.io/reference/api/2025-01/
RFCs
- RFC 7677: SCRAM-SHA-256
- RFC 5802: SASL SCRAM
- RFC 7301: TLS ALPN
Community Resources
- FreeTDS (TDS protocol): https://www.freetds.org/tds.html
- Wireshark protocol dissectors: Database protocol identification
- GitHub: redwood-wire-protocol/oracle-database-wire-protocol-unofficial-specification
Research Status: Complete Confidence Level: HIGH (PostgreSQL, MySQL, HTTP APIs), MEDIUM (TDS, DRDA), LOW (Oracle TNS) Recommended Next Action: Begin architecture design with findings coordination