Skip to content

Protocol Specifications Summary

Protocol Specifications Summary

HeliosDB Multi-Protocol Database Support Research

Researcher Agent Analysis Date: 2025-10-10 Status: Phase 1 Research Complete


Executive Summary

This document provides comprehensive protocol specifications for HeliosDB’s multi-protocol database support. Research covers wire protocols, authentication flows, packet formats, and client compatibility requirements for PostgreSQL, MySQL, SQL Server (TDS), DB2 (DRDA), Oracle/Tibero, Snowflake, Databricks, and Pinecone.

Key Finding: All target protocols have well-documented specifications with clear autodetection patterns. Implementation feasibility is HIGH for Phase 1 (PostgreSQL, MySQL, HTTP APIs) and MEDIUM for Phase 2 (TDS, DRDA, Oracle).


1. PostgreSQL Protocol (libpq) - GOLD Target

Wire Protocol Overview

  • Protocol Version: 3.2 (PostgreSQL 18+)
  • Transport: TCP/IP, Unix domain sockets
  • Message-Based: Frontend/Backend message exchange
  • Default Port: 5432

Packet Structure

Message Format:
- Message Type (1 byte): Single character identifier
- Length (4 bytes): Message length including length field, excluding type byte
- Payload (variable): Message-specific data

Authentication Flow (SCRAM-SHA-256)

Specification: RFC 7677 (SCRAM-SHA-256), RFC 5802 (SASL)

Flow Sequence:

  1. Client → Server: Startup Message

    • Protocol version (3.0)
    • Parameters (user, database, etc.)
  2. Server → Client: AuthenticationSASL

    • List of supported SASL mechanisms
    • Prioritized order (SCRAM-SHA-256-PLUS, SCRAM-SHA-256)
  3. Client → Server: SASLInitialResponse

    • Chosen mechanism
    • SCRAM client-first-message
  4. Server → Client: AuthenticationSASLContinue

    • SCRAM server-first-message
  5. Client → Server: SASLResponse

    • SCRAM client-final-message
  6. Server → Client: AuthenticationSASLFinal or AuthenticationOk

Important Notes:

  • Server ignores username in SCRAM messages; uses startup message username
  • Channel binding (SCRAM-SHA-256-PLUS) provides MITM protection with TLS
  • Password stored as SCRAM-SHA-256 hash on server

TLS Integration

  • ALPN Identifier: “postgresql” (RFC 7301, IANA registered)
  • Required: ALPN extension mandatory for SSL connections (protocol confusion protection)
  • Versions: TLS 1.2+ recommended

Key Message Types

  • ‘Q’: Simple query
  • ‘P’: Parse (prepared statement)
  • ‘B’: Bind (parameters to prepared statement)
  • ‘E’: Execute
  • ‘D’: Describe
  • ‘C’: Close
  • ‘X’: Terminate
  • ‘T’: Row description
  • ‘D’: Data row
  • ‘Z’: ReadyForQuery

Implementation Requirements

  • Message framing with type + length prefix
  • SCRAM-SHA-256 authentication handler
  • TLS with ALPN support
  • Prepared statement caching
  • Extended query protocol support
  • Cursor support (DECLARE, FETCH, CLOSE)

2. MySQL Protocol v10 - GOLD Target

Wire Protocol Overview

  • Protocol Version: 10 (since MySQL 3.21.0)
  • Transport: TCP/IP
  • Handshake: Server-initiated
  • Default Port: 3306

Packet Structure

Packet Format:
- Payload Length (3 bytes): Packet size (excluding header)
- Sequence ID (1 byte): Incremental packet counter
- Payload (variable): Protocol-specific data

Handshake Flow (Protocol::HandshakeV10)

Server → Client (Initial Handshake):

- Protocol Version: 0x0a (1 byte)
- Server Version: NULL-terminated string
- Connection ID: 4 bytes
- Auth Plugin Data Part 1: 8 bytes
- Filler: 0x00 (1 byte)
- Capability Flags (lower): 2 bytes
- Character Set: 1 byte
- Status Flags: 2 bytes
- Capability Flags (upper): 2 bytes
- Auth Plugin Data Length: 1 byte
- Reserved: 10 bytes (0x00)
- Auth Plugin Data Part 2: variable
- Auth Plugin Name: NULL-terminated (if CLIENT_PLUGIN_AUTH)

Client → Server (Handshake Response):

  • Capability flags
  • Max packet size
  • Character set
  • Username (NULL-terminated)
  • Auth response
  • Database name (if CLIENT_CONNECT_WITH_DB)
  • Auth plugin name (if CLIENT_PLUGIN_AUTH)

Authentication Methods

caching_sha2_password (MySQL 8.0+ default):

  • SHA-256 based authentication
  • In-memory server-side cache for performance
  • RSA key exchange for password transmission
  • Supports Unix socket and shared-memory protocols
  • Fast reauthentication for cached users

mysql_native_password (legacy):

  • SHA-1 based (less secure)
  • Still widely supported for compatibility

Authentication Flow:

  1. Server sends nonce in auth-plugin-data
  2. Client responds with hash(password) XOR hash(nonce + hash(hash(password)))
  3. For caching_sha2_password: RSA encryption if not cached + no TLS

Capability Flags (Critical)

  • CLIENT_PROTOCOL_41: 4.1+ protocol support
  • CLIENT_SECURE_CONNECTION: New authentication method
  • CLIENT_PLUGIN_AUTH: Pluggable authentication
  • CLIENT_CONNECT_WITH_DB: Database in handshake
  • CLIENT_SSL: SSL support
  • CLIENT_TRANSACTIONS: Transaction status tracking
  • CLIENT_SESSION_TRACK: Session state tracking
  • CLIENT_DEPRECATE_EOF: Deprecate EOF packets

Key Packet Types

  • 0x00: OK_Packet
  • 0xff: ERR_Packet
  • 0xfe: EOF_Packet (deprecated with CLIENT_DEPRECATE_EOF)
  • 0x01: AUTH_SWITCH_REQUEST
  • 0x03: COM_QUERY
  • 0x16: COM_STMT_PREPARE
  • 0x17: COM_STMT_EXECUTE
  • 0x19: COM_STMT_CLOSE

Implementation Requirements

  • Server-first handshake (client connects, server sends handshake)
  • caching_sha2_password support for MySQL 8.0+ compatibility
  • Capability flag negotiation
  • Prepared statement protocol (binary protocol)
  • Character set handling (utf8mb4 default)
  • Transaction status tracking

3. TDS 7.4 Protocol (SQL Server) - BRONZE Target

Wire Protocol Overview

  • Protocol Versions: TDS 7.3, 7.4, 8.0
  • Transport: TCP/IP
  • Default Port: 1433
  • Specification: [MS-TDS] (Microsoft Open Specifications)

Packet Structure

TDS Packet Header (8 bytes):
- Type (1 byte): Packet type indicator
* 0x12: PRELOGIN (client)
* 0x04: TABULAR RESULT (server response to PRELOGIN)
* 0x10: LOGIN7
* 0x01: SQL BATCH
* 0x03: RPC
- Status (1 byte): Packet status flags
- Length (2 bytes): Packet length (header + data)
- SPID (2 bytes): Server process ID
- Packet ID (1 byte): Packet sequence number
- Window (1 byte): Currently unused (0x00)

PRELOGIN Flow (Connection Negotiation)

Structure:

  1. Packet Header (type 0x12)
  2. Token List (option pairs: type + offset + length)
  3. Token Data (actual values)

PRELOGIN Options:

  • VERSION (UL_VERSION, US_SUBBUILD): Client/server version
  • ENCRYPTION (B_FENCRYPTION): Encryption mode
    • ENCRYPT_OFF (0x00): No encryption
    • ENCRYPT_ON (0x01): Encryption required
    • ENCRYPT_NOT_SUP (0x02): Not supported
    • ENCRYPT_REQ (0x03): Encryption required
    • ENCRYPT_CLIENT_CERT (0x80): Certificate-based auth
  • INSTANCE (B_INSTVALIDITY): SQL Server instance name
  • THREADID (UL_THREADID): Client thread ID
  • MARS (B_MARS): Multiple Active Result Sets support
  • TRACEID (GUID_CONNID, GUID_ActivityID): Connection tracking (TDS 7.4+)
  • FEDAUTHREQUIRED (B_FEDAUTHREQUIRED): Federated auth

TLS Integration

  • PRELOGIN wraps SSL/TLS handshake payload
  • For TDS 7.2-7.4: SSL handshake in PRELOGIN packets
  • After successful TLS negotiation: LOGIN7 over encrypted channel
  • TDS 8.0: Mandatory encryption by default

LOGIN7 Authentication

LOGIN7 Structure:
- Length (4 bytes)
- TDS version (4 bytes)
- Packet size (4 bytes)
- Client version (4 bytes)
- Client PID (4 bytes)
- Connection ID (4 bytes)
- Option flags 1 (1 byte)
- Option flags 2 (1 byte)
- Type flags (1 byte)
- Option flags 3 (1 byte)
- Client timezone (4 bytes)
- Client LCID (4 bytes)
- Offset/length pairs for:
* Hostname
* Username
* Password (obfuscated)
* App name
* Server name
* Library name
* Language
* Database

Password Obfuscation: XOR with 0xA5, then swap nibbles

Implementation Requirements

  • PRELOGIN negotiation (encryption, version, MARS)
  • LOGIN7 authentication
  • TLS support (PRELOGIN-wrapped handshake)
  • Basic SQL BATCH execution
  • Error token parsing (SQLSTATE codes)
  • Parameter binding (RPC procedures, sp_executesql)

4. DRDA Protocol (DB2) - BRONZE Target

Wire Protocol Overview

  • Standard: Open Group DRDA v2
  • Architecture: Distributed Data Management (DDM)
  • Default Port: 50000
  • Character Encoding: EBCDIC (unless negotiated)

Packet Structure

DDM Packet Format:
- Length (2 bytes): Total length in bytes
- Magic ID (1 byte): 0xD0 (fixed)
- Format (1 byte): Formatting flags
- Code Point (2 bytes): Command/reply identifier
- Data (variable): Code point specific data

Connection Flow (EXCSAT → ACCSEC → ACCRDB)

1. EXCSAT (Exchange Server Attributes)

  • Purpose: Initiate connection, exchange server info
  • Code Point: 0x1041
  • Key Objects:
    • EXTNAM (0x115E): External name (28 bytes: 20-byte app ID + 8-byte process/thread ID)
    • SRVCLSNM: Server class name
    • SRVNAM: Server name
    • SRVRLSLV: Server release level

Server Response: EXCSATRD

  • Server attributes acknowledgment
  • SRVCLSNM, SRVNAM, SRVRLSLV

2. ACCSEC (Access Security)

  • Purpose: Negotiate authentication mechanism
  • Code Point: 0x106D
  • Key Objects:
    • SECMEC (Security Mechanism):
      • 0x0003: USRIDPWD (User ID + Password)
      • 0x0004: USRIDONL (User ID only)
      • 0x0007: USRENCPWD (Encrypted password)
      • 0x0009: EUSRIDPWD (External user + password)
      • 0x000B: DCE (Kerberos)

Server Response: ACCSECRD

  • Selected security mechanism
  • Security token (if applicable)

3. ACCRDB (Access Relational Database)

  • Purpose: Connect to specific database
  • Code Point: 0x2001
  • Key Objects:
    • RDBNAM: Database name (padded to 18 bytes)
    • RDBACCCL: RDB access class
    • PRDID: Product ID
    • TYPDEFNAM: Type definition name (QTDSQLASC for ASCII, QTDSQLX for mixed)

Server Response: ACCRDBRM

  • Access confirmation
  • Product-specific information

Implementation Requirements

  • DDM packet framing (length + 0xD0 magic + code point)
  • EXCSAT/EXCSATRD exchange
  • ACCSEC/ACCSECRD authentication (USRIDPWD minimum)
  • ACCRDB/ACCRDBRM database access
  • EBCDIC ↔ UTF-8 conversion (unless ASCII negotiated)
  • Basic SQL execution (PRPSQLSTT, OPNQRY, CNTQRY, CLSQRY)
  • Error handling (SQLCARD response)

5. Oracle Net/TTC Protocol - BRONZE Target

Wire Protocol Overview

  • Protocol: TNS (Transparent Network Substrate) + TTC (Two-Task Common)
  • Default Port: 1521
  • Architecture: Layered (TNS → TTC → SQL execution)
  • Byte Order: Big-endian

TNS Packet Structure

TNS Header (8 bytes):
- Packet Length (2 bytes): Total length including header
- Packet Checksum (2 bytes): Typically 0x0000 (disabled)
- Packet Type (1 byte):
* 0x01: CONNECT
* 0x02: ACCEPT
* 0x03: ACK
* 0x04: REFUSE
* 0x05: REDIRECT
* 0x06: DATA
* 0x07: NULL
* 0x09: ABORT
* 0x0B: RESEND
* 0x0C: MARKER
* 0x0D: ATTENTION
* 0x0E: CONTROL
- Reserved (1 byte): 0x00
- Header Checksum (2 bytes): Typically 0x0000

CONNECT Packet

TNS CONNECT Structure:
- TNS Header (type 0x01)
- Version (2 bytes): Protocol version (0x0138 = 312)
- Version Compatible (2 bytes): Min compatible version
- Service Options (2 bytes): Service flags
- Session Data Unit (2 bytes): Max SDU size
- Max Transmission Unit (2 bytes): Max TDU size
- NT Protocol Characteristics (2 bytes)
- Line Turnaround (2 bytes): 0x0000
- Value of 1 in Hardware (2 bytes): Byte order test (0x0001)
- Data Length (2 bytes): Connect data length
- Data Offset (2 bytes): Connect data offset
- Max Receivable Connect Data (4 bytes)
- Connect Flags 0 (1 byte)
- Connect Flags 1 (1 byte)
- Trace Cross Facility Items (variable)
- Connect Data (variable): (DESCRIPTION=...) string

Connect Data Format (ASCII):

(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=DEMO)(CID=(PROGRAM=python)(HOST=client)(USER=demo)))(ADDRESS=(PROTOCOL=TCP)(HOST=server)(PORT=1521)))

TTC (Two-Task Common) Protocol

DATA Packet (type 0x06) Structure:

- TNS Header
- Data Flags (2 bytes)
- Data Type (1 byte):
* 0x01: Protocol negotiation
* 0x02: Data type negotiation
* 0x03: TTI function call
- TTC Payload (variable)

Protocol Negotiation (Data Type 0x01):

  • Client sends supported versions: [6, 5, 4, 3, 2, 1, 0]
  • Server responds with common version
  • Character set information exchange

Data Type Negotiation (Data Type 0x02):

  • Internal data representation exchange
  • Endianness, date format, number format

TTI Function Calls (Data Type 0x03):

  • TTI opcode (1 byte): Function identifier
    • 0x03: OSQL (SQL execution)
    • 0x05: OALL7/OALL8 (bundled execution)
    • 0x11: OAUTH (authentication)
    • 0x68: O3LOGON (3-phase logon)
  • Function-specific parameters

Authentication Flow

  1. Client → CONNECT packet with connect data
  2. Server → ACCEPT or REFUSE
  3. Client → DATA (protocol negotiation)
  4. Server → DATA (protocol ack)
  5. Client → DATA (TTI O3LOGON phase 1)
  6. Server → DATA (auth challenge)
  7. Client → DATA (TTI O3LOGON phase 2 with credentials)
  8. Server → DATA (auth result)

Implementation Requirements

  • TNS packet framing (length + checksum + type)
  • CONNECT/ACCEPT handshake
  • Connect data parsing (DESCRIPTION format)
  • Protocol and data type negotiation
  • TTC layer implementation
  • TTI function calls (minimum: OSQL, O3LOGON)
  • Basic authentication (password-based)
  • “SELECT 1 FROM DUAL” support
  • Character set handling (AL32UTF8)

6. Snowflake REST API - SILVER Target

API Overview

  • Protocol: HTTPS/REST
  • Format: JSON
  • Default Port: 443
  • Specification: OpenAPI compliant
  • Repository: snowflake-rest-api-specs (GitHub)

Authentication Methods

1. Key Pair Authentication (JWT)

JWT Token Generation:
- Header:
{
"alg": "RS256",
"typ": "JWT"
}
- Payload:
{
"iss": "account.user.fingerprint",
"sub": "account.user",
"iat": <issued_at_timestamp>,
"exp": <expiration_timestamp>
}
- Signature: RS256(header.payload, private_key)

Authorization Header:

Authorization: Bearer <jwt_token>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

2. OAuth Token

Authorization: Bearer <oauth_access_token>

3. Password-based (for compatibility)

{
"username": "demo",
"password": "demo",
"account": "local"
}

SQL API Endpoints

Submit Statement (POST /api/v2/statements)

Request:
{
"statement": "SELECT * FROM table WHERE id = ?",
"timeout": 60,
"database": "demo",
"schema": "public",
"warehouse": "compute_wh",
"role": "accountadmin",
"bindings": {
"1": {"type": "FIXED", "value": "123"}
},
"parameters": {
"MULTI_STATEMENT_COUNT": "0"
}
}
Response (synchronous):
{
"statementHandle": "uuid",
"statementStatusUrl": "/api/v2/statements/uuid",
"resultSetMetaData": {
"numRows": 10,
"rowType": [
{"name": "ID", "type": "FIXED", "nullable": false},
{"name": "NAME", "type": "TEXT", "nullable": true}
]
},
"data": [
["1", "Alice"],
["2", "Bob"]
],
"code": "090001",
"message": "Statement executed successfully."
}
Response (asynchronous):
{
"statementHandle": "uuid",
"statementStatusUrl": "/api/v2/statements/uuid",
"message": "Statement execution in progress."
}

Check Statement Status (GET /api/v2/statements/{statementHandle})

Response:
{
"statementHandle": "uuid",
"statementStatusUrl": "/api/v2/statements/uuid",
"status": "success|running|failed",
"resultSetMetaData": {...},
"data": [...],
"stats": {
"numRowsInserted": 0,
"numRowsUpdated": 0,
"numRowsDeleted": 0,
"numDuplicateRowsUpdated": 0
}
}

Cancel Statement (POST /api/v2/statements/{statementHandle}/cancel)

Required Headers

  • Authorization: Bearer token
  • Content-Type: application/json
  • Accept: application/json
  • User-Agent: Application identifier
  • X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT | OAUTH

Implementation Requirements

  • HTTPS/TLS 1.2+ only
  • JWT generation with RS256
  • Asynchronous statement execution handling
  • Statement handle tracking
  • Result set pagination
  • Session parameter management
  • Error code mapping (Snowflake → generic)

7. Databricks SQL API - SILVER Target

API Overview

  • Protocol: HTTP/Thrift over TLS
  • Format: JSON (HTTP API), Thrift binary (driver protocol)
  • Default Port: 443
  • Driver: Thrift-based (no ODBC/JDBC dependency)

Authentication

Personal Access Token (PAT)

Authorization: Bearer <databricks_pat>

OAuth Access Token (Azure)

Authorization: Bearer <oauth_token>

Microsoft Entra ID Token (Azure)

Authorization: Bearer <entra_id_token>

Statement Execution API

Submit Statement (POST /api/2.0/sql/statements)

Request:
{
"warehouse_id": "abc123",
"statement": "SELECT * FROM table WHERE id = :id",
"parameters": [
{"name": "id", "value": "123", "type": "STRING"}
],
"catalog": "main",
"schema": "default",
"disposition": "INLINE|EXTERNAL_LINKS",
"format": "JSON_ARRAY|ARROW_STREAM",
"wait_timeout": "30s",
"on_wait_timeout": "CONTINUE|CANCEL"
}
Response (quick execution):
{
"statement_id": "uuid",
"status": {
"state": "SUCCEEDED|RUNNING|FAILED"
},
"manifest": {
"schema": {
"columns": [
{"name": "id", "type_name": "INT", "position": 0},
{"name": "name", "type_name": "STRING", "position": 1}
]
},
"total_row_count": 100,
"total_chunk_count": 1,
"chunks": [
{
"chunk_index": 0,
"row_count": 100,
"byte_count": 5000
}
]
},
"result": {
"data_array": [
[123, "Alice"],
[124, "Bob"]
]
}
}
Response (async execution):
{
"statement_id": "uuid",
"status": {
"state": "RUNNING"
}
}

Get Statement Status (GET /api/2.0/sql/statements/{statement_id})

Cancel Statement (POST /api/2.0/sql/statements/{statement_id}/cancel)

Thrift Protocol (Driver Internal)

  • TCLIService.Client: Thrift service client
  • Methods:
    • OpenSession
    • ExecuteStatement
    • FetchResults
    • CloseSession
    • GetResultSetMetadata
  • Format: Thrift binary protocol over TLS

Implementation Requirements

  • HTTPS/TLS 1.2+ mandatory
  • PAT or OAuth authentication
  • Warehouse ID resolution
  • Async statement execution
  • Result chunking and pagination
  • Arrow format support (optional)
  • Thrift protocol for driver compatibility
  • Unity Catalog awareness (catalog.schema.table)

8. Pinecone Vector API - SILVER Target

API Overview

  • Protocols: HTTP/REST, gRPC
  • Format: JSON (HTTP), Protocol Buffers (gRPC)
  • Default Port: 443 (HTTPS), various (gRPC)
  • Specification: 2025-01 API version

Authentication

Api-Key: <pinecone_api_key>

Data Plane API (2025-01)

Upsert Vectors (POST /vectors/upsert)

Request:
{
"vectors": [
{
"id": "vec1",
"values": [0.1, 0.2, 0.3, ..., 0.768],
"metadata": {
"category": "books",
"title": "Example"
},
"sparse_values": {
"indices": [10, 45, 123],
"values": [0.5, 0.3, 0.8]
}
}
],
"namespace": "default"
}
Response:
{
"upsertedCount": 1
}

Query Vectors (POST /query)

Request:
{
"vector": [0.1, 0.2, 0.3, ..., 0.768],
"topK": 10,
"namespace": "default",
"filter": {
"category": {"$eq": "books"},
"year": {"$gte": 2020}
},
"includeValues": true,
"includeMetadata": true
}
Response:
{
"matches": [
{
"id": "vec1",
"score": 0.95,
"values": [...],
"metadata": {...}
}
],
"namespace": "default"
}

Delete Vectors (POST /vectors/delete)

Request:
{
"ids": ["vec1", "vec2"],
"deleteAll": false,
"namespace": "default",
"filter": {"category": {"$eq": "books"}}
}

Fetch Vectors (GET /vectors/fetch)

Query params: ?ids=vec1&ids=vec2&namespace=default

Update Vectors (POST /vectors/update)

{
"id": "vec1",
"values": [...],
"setMetadata": {"updated": true},
"namespace": "default"
}

Metadata Filtering

Supported Operators:

  • $eq: Equals
  • $ne: Not equals
  • $gt, $gte: Greater than (or equal)
  • $lt, $lte: Less than (or equal)
  • $in: In array
  • $nin: Not in array

Complex Filters:

{
"$and": [
{"category": {"$eq": "books"}},
{"year": {"$gte": 2020}},
{"$or": [
{"author": {"$eq": "Smith"}},
{"author": {"$eq": "Jones"}}
]}
]
}

gRPC Protocol

  • Package: pinecone.grpc
  • Install: pip install "pinecone[grpc]"
  • Performance: Modest improvement for upsert/query operations
  • Methods:
    • Upsert
    • Query
    • Delete
    • Fetch
    • Update
    • DescribeIndexStats

Index Operations (Control Plane)

  • Create Index (POST /indexes)
  • List Indexes (GET /indexes)
  • Describe Index (GET /indexes/{name})
  • Delete Index (DELETE /indexes/{name})

Implementation Requirements

  • HTTPS/TLS for HTTP API
  • gRPC with TLS for gRPC API
  • API key authentication (header-based)
  • Namespace isolation
  • Metadata filtering (complex boolean expressions)
  • Sparse vector support
  • Index management (create, list, describe, delete)
  • Vector dimension validation
  • Up-to-the-second consistency (upsert/query reflect immediately)

Protocol Autodetection Summary

Detection Strategy Matrix

ProtocolDetection MethodFirst Packet SignaturePortTLS/ALPN
PostgreSQLStartup length + versionLength(4) + 0x000300005432ALPN: “postgresql”
MySQLServer sends firstServer sends HandshakeV10 (0x0a)3306No ALPN
TDSPRELOGIN packetType 0x12 + PRELOGIN structure1433No specific ALPN
DRDAEXCSAT code pointLength + 0xD0 + 0x1041 (EXCSAT)50000No ALPN
Oracle TNSCONNECT packetType 0x01 + version 0x01381521No ALPN
HTTP/HTTPSHTTP request line”GET”, “POST”, “PUT” or TLS ClientHello443ALPN: “h2”, “http/1.1”

Implementation Pseudocode

def detect_protocol(socket):
# Step 1: Check for TLS ClientHello
peek = socket.peek(16)
if is_tls_client_hello(peek):
# Wrap in TLS and check ALPN
tls_socket = wrap_tls(socket)
alpn = tls_socket.selected_alpn_protocol()
if alpn == "postgresql":
return PostgreSQLHandler(tls_socket)
elif alpn in ["h2", "http/1.1"]:
return HTTPHandler(tls_socket)
else:
# Continue detection after TLS handshake
peek = tls_socket.peek(16)
# Step 2: Protocol-specific detection
# PostgreSQL: 4-byte length + 4-byte protocol version
if len(peek) >= 8:
length = struct.unpack("!I", peek[0:4])[0]
protocol = struct.unpack("!I", peek[4:8])[0]
if protocol == 0x00030000 or protocol == 0x00020000:
return PostgreSQLHandler(socket)
# TDS: Type byte 0x12 (PRELOGIN)
if peek[0] == 0x12:
return TDSHandler(socket)
# DRDA: Magic byte 0xD0
if len(peek) >= 3 and peek[2] == 0xD0:
code_point = struct.unpack("!H", peek[4:6])[0]
if code_point == 0x1041: # EXCSAT
return DRDAHandler(socket)
# Oracle TNS: Type 0x01 (CONNECT)
if len(peek) >= 8:
tns_type = peek[4]
if tns_type == 0x01:
return OracleTNSHandler(socket)
# HTTP: Plain text "GET", "POST", etc.
if peek.startswith(b"GET ") or peek.startswith(b"POST "):
return HTTPHandler(socket)
# MySQL: Server sends HandshakeV10 first
# Try sending server handshake and see response
return MySQLHandler(socket) # Default to MySQL (server-initiated)

Magic Bytes Reference

PostgreSQL Startup:
00 00 00 28 [Length: 40 bytes]
00 03 00 00 [Protocol: 3.0.0]
75 73 65 72 [user\0...]
MySQL HandshakeV10 (Server → Client):
1a 00 00 00 [Payload length: 26]
00 [Sequence: 0]
0a [Protocol version: 10]
38 2e 30 [8.0...]
TDS PRELOGIN (Client → Server):
12 [Type: PRELOGIN]
01 [Status: Normal]
00 34 [Length: 52]
00 00 [SPID: 0]
01 [Packet: 1]
00 [Window: 0]
DRDA EXCSAT:
00 2a [Length: 42]
d0 [Magic: 0xD0]
01 [Format]
10 41 [Code point: EXCSAT 0x1041]
Oracle TNS CONNECT:
00 f4 [Length: 244]
00 00 [Checksum: disabled]
01 [Type: CONNECT]
00 [Reserved]
00 00 [Header checksum]
01 38 [Version: 312]
HTTP/1.1:
47 45 54 20 [GET ]
50 4f 53 54 [POST]
TLS ClientHello:
16 [Content type: Handshake]
03 01 [Version: TLS 1.0]
...
00 10 [ALPN extension]

Python Client Driver Compatibility Matrix

PostgreSQL Drivers

psycopg2 (sync)

  • Type: libpq wrapper (C extension)
  • Protocol: PostgreSQL wire protocol
  • Connection Pool: External (psycopg2.pool)
  • Prepared Statements: Server-side via cursor.execute() with named cursors
  • Parameter Style: %s (format), %(name)s (pyformat)
  • Async Support: No (use psycopg3 or asyncpg)
  • Key Behaviors:
    • Automatic type conversion (PostgreSQL → Python)
    • Server-side cursors with cursor(name='...')
    • COPY protocol support
    • Two-phase commit support

asyncpg (async)

  • Type: Pure Python + Cython (protocol implementation)
  • Protocol: PostgreSQL wire protocol (native)
  • Connection Pool: Built-in (asyncpg.create_pool())
  • Prepared Statements: Automatic LRU cache for fetch*() methods
  • Parameter Style: $1, $2, $3 (PostgreSQL native)
  • Async Support: asyncio native
  • Key Behaviors:
    • High performance (no libpq overhead)
    • Automatic prepared statement caching
    • Type codec system (custom types)
    • Does NOT work with PgBouncer in transaction/statement mode
    • Binary protocol for efficiency
    • Cursors and prepared statements invalid after connection release

Critical Differences:

# psycopg2
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# asyncpg
await connection.fetch("SELECT * FROM users WHERE id = $1", user_id)

MySQL Drivers

PyMySQL (sync)

  • Type: Pure Python
  • Protocol: MySQL wire protocol (Python implementation)
  • Connection Pool: External (DBUtils, SQLAlchemy)
  • Prepared Statements: Client-side emulation (string escaping)
  • Parameter Style: %s (format)
  • Async Support: No
  • Key Behaviors:
    • Pure Python (no C dependencies)
    • Client-side parameter escaping (not true prepared statements)
    • Character set handling (utf8mb4)
    • Cursor classes (Cursor, SSCursor, DictCursor)

mysql-connector-python (sync)

  • Type: Pure Python OR C extension
  • Protocol: MySQL wire protocol
  • Connection Pool: Built-in (pooling.MySQLConnectionPool)
  • Prepared Statements: Server-side via cursor(prepared=True)
  • Parameter Style:
    • Regular: %s or %(name)s
    • Prepared: %s or ? only
  • Async Support: No (sync only)
  • Key Behaviors:
    • Official Oracle driver
    • True server-side prepared statements (MySQLCursorPrepared)
    • Automatic statement caching
    • Prepared statement limits: no named parameters
    • Binary protocol for prepared statements
    • Character set negotiation

Critical Differences:

# PyMySQL (client-side escaping)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# mysql-connector-python (server-side prepared)
cursor = conn.cursor(prepared=True)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Statement prepared on first execute, cached for subsequent calls

SQL Server Drivers

pyodbc

  • Type: ODBC wrapper (C extension)
  • Protocol: TDS via ODBC driver
  • Connection Pool: External (SQLAlchemy)
  • Prepared Statements: Via ODBC (driver-dependent)
  • Parameter Style: ? (qmark)
  • Key Behaviors:
    • Requires ODBC driver installation (ODBC Driver 18 for SQL Server)
    • Supports multiple database backends (not just SQL Server)
    • Fast execution for parameterized queries
    • Encryption settings in connection string

pymssql

  • Type: FreeTDS wrapper (C extension)
  • Protocol: TDS via FreeTDS
  • Connection Pool: External
  • Prepared Statements: Via FreeTDS
  • Parameter Style: %s (format) or %(name)s (pyformat)
  • Key Behaviors:
    • Uses FreeTDS library
    • Simpler API than pyodbc
    • Named parameters supported

DB2 Drivers

ibm_db

  • Type: IBM Data Server Driver wrapper (C extension)
  • Protocol: DRDA via IBM driver
  • Connection Pool: Manual
  • Prepared Statements: ibm_db.prepare() + ibm_db.execute()
  • Parameter Style: ? (qmark)
  • Key Behaviors:
    • Requires IBM Data Server Driver installation
    • Two APIs: ibm_db (native), ibm_db_dbi (DB-API 2.0)
    • SSL/TLS support
    • “SELECT 1 FROM sysibm.sysdummy1” for connection test

Oracle Drivers

oracledb (Thin mode)

  • Type: Pure Python (Thin) OR C extension (Thick)
  • Protocol: Oracle Net/TTC (Thin mode: Python implementation)
  • Connection Pool: Built-in (oracledb.create_pool())
  • Prepared Statements: Automatic via cursor.execute()
  • Parameter Style: :name (named) or :1, :2 (numeric)
  • Key Behaviors:
    • Thin mode: no Oracle Client installation required
    • Thick mode: uses Oracle Client libraries
    • “SELECT 1 FROM DUAL” for connection test
    • Bind variables with :variable syntax
    • Advanced features: LOBs, object types, collections

HTTP API Clients

snowflake-connector-python

  • Type: Pure Python
  • Protocol: HTTPS/REST + Arrow (optional)
  • Connection Pool: Built-in
  • Prepared Statements: Via execute() with bindings
  • Parameter Style: ? (qmark) or :name (named)
  • Key Behaviors:
    • JWT or password authentication
    • Result caching
    • Arrow format for performance
    • Session parameter management
    • Multi-statement execution

databricks-sql-connector

  • Type: Thrift-based (no ODBC/JDBC)
  • Protocol: HTTP/Thrift over TLS
  • Connection Pool: Built-in
  • Prepared Statements: Via execute() with parameters
  • Parameter Style: :name (named)
  • Key Behaviors:
    • PAT or OAuth authentication
    • Async execution support
    • Result chunking
    • Unity Catalog support (catalog.schema.table)
    • Thrift TCLIService client

pinecone-client

  • Type: Pure Python
  • Protocol: HTTP/REST or gRPC
  • Connection Pool: requests library (HTTP) or grpcio (gRPC)
  • API Style: Vector operations (upsert, query, delete, fetch, update)
  • Key Behaviors:
    • API key authentication
    • Namespace isolation
    • Metadata filtering
    • Sparse vector support
    • gRPC for performance (optional)

Implementation Priority & Recommendations

Phase 1: PostgreSQL + MySQL (GOLD)

Timeline: 4-6 weeks

PostgreSQL:

  • Well-documented wire protocol (v3.2)
  • SCRAM-SHA-256 specification (RFC 7677)
  • TLS ALPN support (“postgresql”)
  • Extended query protocol for prepared statements
  • Rich client ecosystem (psycopg2, asyncpg)
  • ⚠ Parameter style differences (psycopg2: %s, asyncpg: $1)

MySQL:

  • Clear HandshakeV10 specification
  • Server-first handshake (easy autodetection)
  • caching_sha2_password documented
  • Prepared statement protocol (binary)
  • ⚠ Client-side vs server-side prepared statements (PyMySQL vs mysql-connector-python)
  • ⚠ Capability flag negotiation critical

Recommended Implementation:

  1. Implement PostgreSQL SCRAM-SHA-256 first (more complex auth)
  2. Implement MySQL caching_sha2_password
  3. Build unified parameter binding layer ($1/? → internal format)
  4. Create dialect adapters (LIMIT/OFFSET, identifier quoting)
  5. Test with both sync and async clients

Phase 1.5: HTTP APIs (SILVER)

Timeline: 3-4 weeks

Snowflake REST:

  • OpenAPI specification available
  • JWT authentication well-documented
  • Async execution model
  • ⚠ Result set pagination handling

Databricks SQL:

  • Thrift protocol (established)
  • Statement execution API clear
  • OAuth/PAT authentication
  • ⚠ Unity Catalog namespace handling

Pinecone:

  • Simple REST API (2025-01)
  • gRPC specification available
  • Metadata filtering well-defined
  • ⚠ Vector dimension validation
  • ⚠ SQL translation for vector operations

Recommended Implementation:

  1. HTTP gateway with path-based routing (/snowflake/, /dbsql/, /pinecone/*)
  2. JWT/OAuth token validation
  3. Async statement execution handling
  4. Vector operation translation (SQL → Pinecone API)

Phase 2: SQL Server + DB2 (BRONZE)

Timeline: 4-5 weeks

TDS 7.4:

  • Microsoft open specification [MS-TDS]
  • PRELOGIN negotiation documented
  • LOGIN7 structure clear
  • ⚠ TLS wrapping in PRELOGIN packets (complex)
  • ⚠ Password obfuscation (XOR + nibble swap)

DRDA:

  • Open Group standard (freely available)
  • EXCSAT/ACCSEC/ACCRDB flow documented
  • ⚠ EBCDIC encoding handling
  • ⚠ Limited client testing (ibm_db dependency)

Recommended Implementation:

  1. Focus on basic connectivity + simple queries
  2. PRELOGIN/LOGIN7 for TDS
  3. EXCSAT/ACCSEC/ACCRDB for DRDA
  4. Minimal feature set (no cursors, advanced features)

Phase 3: Oracle/Tibero (BRONZE → SILVER)

Timeline: 5-6 weeks

Oracle Net/TTC:

  • ⚠ Reverse-engineered specifications (community docs)
  • ⚠ Complex layered protocol (TNS → TTC → TTI)
  • ⚠ Connect data parsing ((DESCRIPTION=…))
  • ⚠ Limited official documentation
  • oracledb Thin mode provides Python reference

Recommended Implementation:

  1. Start with TNS CONNECT/ACCEPT
  2. Basic TTC protocol negotiation
  3. Simple TTI function calls (OSQL)
  4. “SELECT 1 FROM DUAL” support
  5. Expand based on demand

Research Findings Summary

Key Insights

  1. PostgreSQL has the most robust specification with IANA-registered ALPN, RFC-based authentication, and comprehensive protocol documentation. GOLD compatibility is highly achievable.

  2. MySQL’s server-first handshake makes autodetection straightforward. The distinction between client-side (PyMySQL) and server-side (mysql-connector-python) prepared statements requires careful handling.

  3. TDS and DRDA have open specifications (Microsoft Open Specs, Open Group), making Bronze-level compatibility feasible without reverse engineering.

  4. Oracle Net/TTC relies on community documentation, making it the highest-risk protocol. Thin mode in oracledb provides a Python reference implementation.

  5. HTTP APIs are well-specified and suitable for Silver compatibility. JWT/OAuth authentication is standard across Snowflake, Databricks, and Pinecone.

  6. Protocol autodetection is feasible using magic bytes and TLS ALPN, with PostgreSQL and MySQL being the easiest to detect.

Risks & Mitigations

RiskImpactMitigation
SCRAM-SHA-256 implementation complexityMediumUse existing crypto libraries; RFC 5802/7677 compliance
asyncpg parameter style ($1) vs psycopg2 (%s)MediumUnified parameter binding layer with dialect translation
TLS-wrapped protocols (TDS PRELOGIN)MediumTLS peek + unwrap; handle PRELOGIN as TLS payload
EBCDIC encoding (DRDA)LowUse codecs library; negotiate ASCII mode early
Oracle TNS complexity (layered protocol)HighLimit to Bronze; use oracledb Thin as reference
Client driver behavioral differencesMediumComprehensive integration testing with each driver

Testing Requirements

Must-Have Tests (per protocol):

  1. Connection establishment
  2. Authentication (all supported methods)
  3. Simple query execution (SELECT 1)
  4. Prepared statement with parameters
  5. Transaction (BEGIN/COMMIT/ROLLBACK)
  6. Cursor operations (where applicable)
  7. Error handling (invalid query, auth failure)
  8. Data type round-trip (INT, TEXT, TIMESTAMP, DECIMAL, BYTEA/BLOB)
  9. Vector operations (Pinecone, PostgreSQL pgvector extension)

Python Client Matrix Testing:

  • PostgreSQL: psycopg2, asyncpg, SQLAlchemy
  • MySQL: PyMySQL, mysql-connector-python, SQLAlchemy
  • SQL Server: pyodbc, pymssql
  • DB2: ibm_db
  • Oracle: oracledb (Thin mode)
  • Snowflake: snowflake-connector-python
  • Databricks: databricks-sql-connector
  • Pinecone: pinecone-client (HTTP + gRPC)

Next Steps

  1. Architecture Design: Coordinate with architect agent on protocol router, handler interfaces, and dialect adapter architecture
  2. Implementation Plan: Coordinate with coder agent on module structure, dependencies, and code organization
  3. Testing Strategy: Define integration test suite with CI/CD pipeline
  4. Documentation: Create DSN examples, compatibility matrix, and “expected differences” documentation

References

Official Specifications

RFCs

  • RFC 7677: SCRAM-SHA-256
  • RFC 5802: SASL SCRAM
  • RFC 7301: TLS ALPN

Community Resources

  • FreeTDS (TDS protocol): https://www.freetds.org/tds.html
  • Wireshark protocol dissectors: Database protocol identification
  • GitHub: redwood-wire-protocol/oracle-database-wire-protocol-unofficial-specification

Research Status: Complete Confidence Level: HIGH (PostgreSQL, MySQL, HTTP APIs), MEDIUM (TDS, DRDA), LOW (Oracle TNS) Recommended Next Action: Begin architecture design with findings coordination