HeliosDB Multi-Protocol Research Index
HeliosDB Multi-Protocol Research Index
Research Agent: Complete Protocol Compatibility Analysis Date: 2025-10-10 Status: All Research Complete
Quick Navigation
π Start Here
RESEARCH_SUMMARY.md - Executive summary, key findings, implementation roadmap, and coordination notes for architect/coder agents
π Detailed Research Documents
-
protocol_specifications_summary.md (38 KB)
- Wire protocol overviews for all 8 protocols
- PostgreSQL (libpq), MySQL v10, TDS 7.4, DRDA, Oracle Net/TTC
- Snowflake REST, Databricks SQL HTTP/Thrift, Pinecone HTTP/gRPC
- Protocol autodetection strategy with magic bytes reference
- Implementation checklists and requirements
-
authentication_flows.md (39 KB)
- Step-by-step authentication flows with diagrams
- PostgreSQL SCRAM-SHA-256 (RFC 5802/7677 compliant)
- MySQL caching_sha2_password (fast auth + full auth)
- TDS 7.4 PRELOGIN β LOGIN7 (with TLS wrapping)
- DRDA EXCSAT β ACCSEC β SECCHK
- Oracle O3LOGON (3-phase logon)
- HTTP API authentication (JWT, OAuth, API keys)
-
packet_formats.md (47 KB)
- Complete wire-level packet specifications
- Byte-by-byte field descriptions for all protocols
- Message type tables and data type encodings
- Protocol detection magic bytes reference
- Example hex dumps for key packets
-
python_client_compatibility.md (32 KB)
- Analysis of 15 Python client libraries
- psycopg2, asyncpg, PyMySQL, mysql-connector-python
- pyodbc, pymssql, ibm_db, oracledb
- snowflake-connector-python, databricks-sql-connector, pinecone-client
- Parameter binding styles, prepared statements, connection pooling
- Implementation recommendations and testing requirements
Research Coverage
Protocols Analyzed (8 total)
| Protocol | Target Level | Feasibility | Document References |
|---|---|---|---|
| PostgreSQL (libpq) | GOLD | HIGH | specs Β§1, auth Β§1, packets Β§1, clients Β§1 |
| MySQL Protocol v10 | GOLD | HIGH | specs Β§2, auth Β§2, packets Β§2, clients Β§2 |
| TDS 7.4 (SQL Server) | BRONZE | MEDIUM | specs Β§3, auth Β§3, packets Β§3, clients Β§3 |
| DRDA (DB2) | BRONZE | MEDIUM | specs Β§4, auth Β§4, packets Β§4, clients Β§4 |
| Oracle Net/TTC | BRONZE | LOW | specs Β§5, auth Β§5, packets Β§5, clients Β§5 |
| Snowflake REST API | SILVER | HIGH | specs Β§6, auth Β§6, packets Β§6, clients Β§6 |
| Databricks SQL | SILVER | HIGH | specs Β§7, auth Β§7, packets Β§6, clients Β§6 |
| Pinecone HTTP/gRPC | SILVER | HIGH | specs Β§8, auth Β§8, packets Β§6, clients Β§6 |
Python Clients Evaluated (15 total)
| Ecosystem | Clients | Parameter Styles | Prepared Statements |
|---|---|---|---|
| PostgreSQL | psycopg2, asyncpg, SQLAlchemy | %s, $1, %(name)s | Server-side (asyncpg), libpq (psycopg2) |
| MySQL | PyMySQL, mysql-connector-python, SQLAlchemy | %s, ? | Client-side (PyMySQL), Server-side (mysql-connector) |
| SQL Server | pyodbc, pymssql | ?, %s | ODBC driver, FreeTDS |
| DB2 | ibm_db, ibm_db_dbi | ? | IBM Data Server Driver |
| Oracle | oracledb (Thin/Thick) | :name, :1 | Auto-caching |
| HTTP APIs | snowflake, databricks, pinecone | Various | N/A (REST/Thrift) |
Key Research Findings
1. Protocol Autodetection
- TLS ALPN: PostgreSQL (βpostgresqlβ), HTTP (βh2β, βhttp/1.1β)
- Magic Bytes: PostgreSQL (0x00030000), TDS (0x12), DRDA (0xD0), Oracle TNS (0x01)
- Server-First: MySQL HandshakeV10 (0x0a after sequence byte)
2. Authentication Complexity
From simplest to most complex:
- Pinecone (API key header)
- Snowflake (JWT RS256)
- Databricks (PAT/OAuth)
- MySQL (caching_sha2_password)
- PostgreSQL (SCRAM-SHA-256)
- TDS 7.4 (LOGIN7 + TLS wrapping)
- DRDA (EXCSAT/ACCSEC/SECCHK)
- Oracle (O3LOGON 3-phase)
3. Parameter Binding Styles
5 different styles identified:
%s(format): psycopg2, PyMySQL, pymssql%(name)s(pyformat): psycopg2, pymssql, snowflake$1, $2(postgres): asyncpg?(qmark): pyodbc, ibm_db, mysql-connector (prepared):name(named): oracledb, databricks-sql
Solution: Unified parameter adapter required
4. Prepared Statement Paradigms
- Client-Side Escaping: PyMySQL (no binary protocol)
- Server-Side Binary: mysql-connector-python, asyncpg, oracledb
- Both supported: HeliosDB must handle both paradigms
Implementation Roadmap Summary
Phase 1: PostgreSQL + MySQL (GOLD) - 4-6 weeks
- PostgreSQL handler (SCRAM-SHA-256, extended query protocol)
- MySQL handler (HandshakeV10, caching_sha2_password, binary protocol)
- Protocol router (TLS ALPN + magic bytes detection)
- Unified parameter adapter
- Integration tests: psycopg2, asyncpg, PyMySQL, mysql-connector-python
Phase 1.5: HTTP APIs (SILVER) - 3-4 weeks
- HTTP gateway (path routing: /snowflake/, /dbsql/, /pinecone/*)
- JWT/OAuth authentication
- Async statement execution
- Vector operation translator (SQL β Pinecone API)
- Integration tests: snowflake-connector-python, databricks-sql-connector, pinecone-client
Phase 2: SQL Server + DB2 (BRONZE) - 4-5 weeks
- TDS handler (PRELOGIN, LOGIN7, basic SQL_BATCH)
- DRDA handler (EXCSAT/ACCSEC/SECCHK/ACCRDB)
- EBCDIC β UTF-8 conversion (or ASCII negotiation)
- Integration tests: pyodbc, ibm_db
Phase 3: Oracle/Tibero (BRONZE β SILVER) - 5-6 weeks
- TNS packet handling (CONNECT, ACCEPT, DATA)
- Basic TTC protocol negotiation
- Simple TTI function calls (OSQL for βSELECT 1 FROM DUALβ)
- Integration test: oracledb Thin mode
For Architect Agent
Key Architecture Decisions Needed:
- Protocol router design (TLS ALPN + magic bytes)
- Unified abstractions (AST/plan API, parameter model, type system, error model)
- Authentication framework (pluggable adapters)
- Connection management (server-side pooling, asyncio)
- Dialect adapter layer (SQL rewriters, identifier quoting)
Questions to Address:
- How to handle protocol-specific features? (e.g., PostgreSQL LISTEN/NOTIFY)
- Unified SQL dialect or protocol-specific dialects?
- Distributed transaction handling across protocols?
See: RESEARCH_SUMMARY.md Β§ βCoordination with Architect & Coder Agentsβ
For Coder Agent
Implementation Modules Needed:
heliosdb/βββ protocols/β βββ router.py # Protocol detection & routingβ βββ postgresql/ # PostgreSQL handlerβ βββ mysql/ # MySQL handlerβ βββ tds/ # SQL Server handlerβ βββ drda/ # DB2 handlerβ βββ oracle/ # Oracle handlerβ βββ http/ # Snowflake, Databricks, Pineconeβββ dialects/β βββ postgresql.py # PostgreSQL SQL rewriterβ βββ mysql.py # MySQL SQL rewriterβ βββ ...βββ auth/β βββ scram.py # SCRAM-SHA-256β βββ caching_sha2.py # MySQL authβ βββ jwt.py # JWT authβ βββ ...βββ parameters/β βββ adapter.py # Unified parameter adapterβββ types/ βββ base.py # Internal type system βββ mappers.py # Protocol type mappersDependencies:
cryptography(SCRAM, JWT, TLS)asyncio(async protocol handlers)struct(binary packet encoding)
See: RESEARCH_SUMMARY.md Β§ βCoordination with Architect & Coder Agentsβ
Testing Strategy
Integration Test Matrix
Run against each Python client library:
- Connection β Auth β Query β Prepared Statement β Transaction β Cursor
- Validate parameter binding, type conversion, error mapping
CI/CD Pipeline
test-postgresql: [psycopg2, asyncpg]test-mysql: [pymysql, mysql-connector-python]test-http-apis: [snowflake-connector-python, databricks-sql-connector, pinecone-client]test-sqlserver: [pyodbc]test-db2: [ibm_db]test-oracle: [oracledb]See: python_client_compatibility.md Β§ βTesting Strategyβ
Risk Assessment
High-Risk
- Oracle protocol complexity (reverse-engineered specs)
- asyncpg prepared statement cache compatibility
- TDS TLS wrapping (PRELOGIN packets)
Medium-Risk
- EBCDIC encoding (DRDA)
- Multi-protocol type mapping
- Async/sync driver coexistence
Low-Risk
- HTTP API integration
- Connection pooling
See: RESEARCH_SUMMARY.md Β§ βRisk Assessment & Mitigationβ
Success Metrics
Phase 1 (PostgreSQL + MySQL)
- psycopg2, asyncpg, PyMySQL, mysql-connector-python pass integration tests
- TLS ALPN detection works
- Protocol autodetection works
- Parameter translation works (%s β $1 β ? β :name)
- Performance: <10ms latency overhead
Phase 1.5 (HTTP APIs)
- All HTTP API clients can execute queries
- JWT/OAuth/API key authentication works
- Vector SQL β Pinecone API translation works
Phase 2 (SQL Server + DB2)
- pyodbc and ibm_db can connect and run basic queries
Phase 3 (Oracle)
- oracledb Thin mode can connect and run βSELECT 1 FROM DUALβ
Document Statistics
| Document | Size | Sections | Focus |
|---|---|---|---|
| protocol_specifications_summary.md | 38 KB | 9 | Protocol specs, autodetection, implementation checklists |
| authentication_flows.md | 39 KB | 5 | Auth flows, step-by-step diagrams, security details |
| packet_formats.md | 47 KB | 7 | Wire-level packet formats, byte layouts, magic bytes |
| python_client_compatibility.md | 32 KB | 7 | Client analysis, parameter styles, testing strategy |
| RESEARCH_SUMMARY.md | 26 KB | 10 | Executive summary, roadmap, coordination notes |
| Total | 182 KB | 38 | Comprehensive protocol research |
Quick Reference Tables
Protocol Detection Magic Bytes
PostgreSQL: 00 00 00 XX [Length] + 00 03 00 00 [Protocol 3.0]MySQL: Server sends 0x0a (HandshakeV10) after sequence byteTDS: 12 [Type: PRELOGIN]DRDA: XX XX [Length] + D0 [Magic] + 10 41 [EXCSAT]Oracle TNS: XX XX [Length] + 00 00 [Checksum] + 01 [Type: CONNECT]HTTP: 47 45 54 20 [GET ] or 50 4f 53 54 [POST]TLS: 16 [Handshake] β check ALPN extensionParameter Style Quick Reference
psycopg2: %s, %(name)sasyncpg: $1, $2, ...PyMySQL: %smysql-connector (prep): %s, ?pyodbc: ?ibm_db: ?oracledb: :name, :1snowflake: ?, %(name)sdatabricks: :nameDefault Ports
PostgreSQL: 5432MySQL: 3306SQL Server: 1433DB2: 50000Oracle: 1521HTTPS: 443 (Snowflake, Databricks, Pinecone)Next Actions
- Architect Agent: Review research, design protocol router and unified abstractions
- Coder Agent: Implement PostgreSQL handler (proof-of-concept for Phase 1)
- Testing: Set up CI/CD pipeline with integration test matrix
- Documentation: Create user-facing DSN examples and migration guides
Research Complete All Documents Available: See file list above Ready for Implementation: Phase 1 (PostgreSQL + MySQL)
Research conducted by HeliosDB Researcher Agent Date: 2025-10-10 Status: Complete