Skip to content

HeliosDB Multi-Protocol Research Index

HeliosDB Multi-Protocol Research Index

Research Agent: Complete Protocol Compatibility Analysis Date: 2025-10-10 Status: All Research Complete


Quick Navigation

πŸ“‹ Start Here

RESEARCH_SUMMARY.md - Executive summary, key findings, implementation roadmap, and coordination notes for architect/coder agents

πŸ“š Detailed Research Documents

  1. protocol_specifications_summary.md (38 KB)

    • Wire protocol overviews for all 8 protocols
    • PostgreSQL (libpq), MySQL v10, TDS 7.4, DRDA, Oracle Net/TTC
    • Snowflake REST, Databricks SQL HTTP/Thrift, Pinecone HTTP/gRPC
    • Protocol autodetection strategy with magic bytes reference
    • Implementation checklists and requirements
  2. authentication_flows.md (39 KB)

    • Step-by-step authentication flows with diagrams
    • PostgreSQL SCRAM-SHA-256 (RFC 5802/7677 compliant)
    • MySQL caching_sha2_password (fast auth + full auth)
    • TDS 7.4 PRELOGIN β†’ LOGIN7 (with TLS wrapping)
    • DRDA EXCSAT β†’ ACCSEC β†’ SECCHK
    • Oracle O3LOGON (3-phase logon)
    • HTTP API authentication (JWT, OAuth, API keys)
  3. packet_formats.md (47 KB)

    • Complete wire-level packet specifications
    • Byte-by-byte field descriptions for all protocols
    • Message type tables and data type encodings
    • Protocol detection magic bytes reference
    • Example hex dumps for key packets
  4. python_client_compatibility.md (32 KB)

    • Analysis of 15 Python client libraries
    • psycopg2, asyncpg, PyMySQL, mysql-connector-python
    • pyodbc, pymssql, ibm_db, oracledb
    • snowflake-connector-python, databricks-sql-connector, pinecone-client
    • Parameter binding styles, prepared statements, connection pooling
    • Implementation recommendations and testing requirements

Research Coverage

Protocols Analyzed (8 total)

ProtocolTarget LevelFeasibilityDocument References
PostgreSQL (libpq)GOLDHIGHspecs Β§1, auth Β§1, packets Β§1, clients Β§1
MySQL Protocol v10GOLDHIGHspecs Β§2, auth Β§2, packets Β§2, clients Β§2
TDS 7.4 (SQL Server)BRONZEMEDIUMspecs Β§3, auth Β§3, packets Β§3, clients Β§3
DRDA (DB2)BRONZEMEDIUMspecs Β§4, auth Β§4, packets Β§4, clients Β§4
Oracle Net/TTCBRONZELOWspecs Β§5, auth Β§5, packets Β§5, clients Β§5
Snowflake REST APISILVERHIGHspecs Β§6, auth Β§6, packets Β§6, clients Β§6
Databricks SQLSILVERHIGHspecs Β§7, auth Β§7, packets Β§6, clients Β§6
Pinecone HTTP/gRPCSILVERHIGHspecs Β§8, auth Β§8, packets Β§6, clients Β§6

Python Clients Evaluated (15 total)

EcosystemClientsParameter StylesPrepared Statements
PostgreSQLpsycopg2, asyncpg, SQLAlchemy%s, $1, %(name)sServer-side (asyncpg), libpq (psycopg2)
MySQLPyMySQL, mysql-connector-python, SQLAlchemy%s, ?Client-side (PyMySQL), Server-side (mysql-connector)
SQL Serverpyodbc, pymssql?, %sODBC driver, FreeTDS
DB2ibm_db, ibm_db_dbi?IBM Data Server Driver
Oracleoracledb (Thin/Thick):name, :1Auto-caching
HTTP APIssnowflake, databricks, pineconeVariousN/A (REST/Thrift)

Key Research Findings

1. Protocol Autodetection

  • TLS ALPN: PostgreSQL (β€œpostgresql”), HTTP (β€œh2”, β€œhttp/1.1”)
  • Magic Bytes: PostgreSQL (0x00030000), TDS (0x12), DRDA (0xD0), Oracle TNS (0x01)
  • Server-First: MySQL HandshakeV10 (0x0a after sequence byte)

2. Authentication Complexity

From simplest to most complex:

  1. Pinecone (API key header)
  2. Snowflake (JWT RS256)
  3. Databricks (PAT/OAuth)
  4. MySQL (caching_sha2_password)
  5. PostgreSQL (SCRAM-SHA-256)
  6. TDS 7.4 (LOGIN7 + TLS wrapping)
  7. DRDA (EXCSAT/ACCSEC/SECCHK)
  8. Oracle (O3LOGON 3-phase)

3. Parameter Binding Styles

5 different styles identified:

  • %s (format): psycopg2, PyMySQL, pymssql
  • %(name)s (pyformat): psycopg2, pymssql, snowflake
  • $1, $2 (postgres): asyncpg
  • ? (qmark): pyodbc, ibm_db, mysql-connector (prepared)
  • :name (named): oracledb, databricks-sql

Solution: Unified parameter adapter required

4. Prepared Statement Paradigms

  • Client-Side Escaping: PyMySQL (no binary protocol)
  • Server-Side Binary: mysql-connector-python, asyncpg, oracledb
  • Both supported: HeliosDB must handle both paradigms

Implementation Roadmap Summary

Phase 1: PostgreSQL + MySQL (GOLD) - 4-6 weeks

  • PostgreSQL handler (SCRAM-SHA-256, extended query protocol)
  • MySQL handler (HandshakeV10, caching_sha2_password, binary protocol)
  • Protocol router (TLS ALPN + magic bytes detection)
  • Unified parameter adapter
  • Integration tests: psycopg2, asyncpg, PyMySQL, mysql-connector-python

Phase 1.5: HTTP APIs (SILVER) - 3-4 weeks

  • HTTP gateway (path routing: /snowflake/, /dbsql/, /pinecone/*)
  • JWT/OAuth authentication
  • Async statement execution
  • Vector operation translator (SQL β†’ Pinecone API)
  • Integration tests: snowflake-connector-python, databricks-sql-connector, pinecone-client

Phase 2: SQL Server + DB2 (BRONZE) - 4-5 weeks

  • TDS handler (PRELOGIN, LOGIN7, basic SQL_BATCH)
  • DRDA handler (EXCSAT/ACCSEC/SECCHK/ACCRDB)
  • EBCDIC ↔ UTF-8 conversion (or ASCII negotiation)
  • Integration tests: pyodbc, ibm_db

Phase 3: Oracle/Tibero (BRONZE β†’ SILVER) - 5-6 weeks

  • TNS packet handling (CONNECT, ACCEPT, DATA)
  • Basic TTC protocol negotiation
  • Simple TTI function calls (OSQL for β€œSELECT 1 FROM DUAL”)
  • Integration test: oracledb Thin mode

For Architect Agent

Key Architecture Decisions Needed:

  1. Protocol router design (TLS ALPN + magic bytes)
  2. Unified abstractions (AST/plan API, parameter model, type system, error model)
  3. Authentication framework (pluggable adapters)
  4. Connection management (server-side pooling, asyncio)
  5. Dialect adapter layer (SQL rewriters, identifier quoting)

Questions to Address:

  • How to handle protocol-specific features? (e.g., PostgreSQL LISTEN/NOTIFY)
  • Unified SQL dialect or protocol-specific dialects?
  • Distributed transaction handling across protocols?

See: RESEARCH_SUMMARY.md Β§ β€œCoordination with Architect & Coder Agents”


For Coder Agent

Implementation Modules Needed:

heliosdb/
β”œβ”€β”€ protocols/
β”‚ β”œβ”€β”€ router.py # Protocol detection & routing
β”‚ β”œβ”€β”€ postgresql/ # PostgreSQL handler
β”‚ β”œβ”€β”€ mysql/ # MySQL handler
β”‚ β”œβ”€β”€ tds/ # SQL Server handler
β”‚ β”œβ”€β”€ drda/ # DB2 handler
β”‚ β”œβ”€β”€ oracle/ # Oracle handler
β”‚ └── http/ # Snowflake, Databricks, Pinecone
β”œβ”€β”€ dialects/
β”‚ β”œβ”€β”€ postgresql.py # PostgreSQL SQL rewriter
β”‚ β”œβ”€β”€ mysql.py # MySQL SQL rewriter
β”‚ └── ...
β”œβ”€β”€ auth/
β”‚ β”œβ”€β”€ scram.py # SCRAM-SHA-256
β”‚ β”œβ”€β”€ caching_sha2.py # MySQL auth
β”‚ β”œβ”€β”€ jwt.py # JWT auth
β”‚ └── ...
β”œβ”€β”€ parameters/
β”‚ └── adapter.py # Unified parameter adapter
└── types/
β”œβ”€β”€ base.py # Internal type system
└── mappers.py # Protocol type mappers

Dependencies:

  • cryptography (SCRAM, JWT, TLS)
  • asyncio (async protocol handlers)
  • struct (binary packet encoding)

See: RESEARCH_SUMMARY.md Β§ β€œCoordination with Architect & Coder Agents”


Testing Strategy

Integration Test Matrix

Run against each Python client library:

  • Connection β†’ Auth β†’ Query β†’ Prepared Statement β†’ Transaction β†’ Cursor
  • Validate parameter binding, type conversion, error mapping

CI/CD Pipeline

test-postgresql: [psycopg2, asyncpg]
test-mysql: [pymysql, mysql-connector-python]
test-http-apis: [snowflake-connector-python, databricks-sql-connector, pinecone-client]
test-sqlserver: [pyodbc]
test-db2: [ibm_db]
test-oracle: [oracledb]

See: python_client_compatibility.md Β§ β€œTesting Strategy”


Risk Assessment

High-Risk

  • Oracle protocol complexity (reverse-engineered specs)
  • asyncpg prepared statement cache compatibility
  • TDS TLS wrapping (PRELOGIN packets)

Medium-Risk

  • EBCDIC encoding (DRDA)
  • Multi-protocol type mapping
  • Async/sync driver coexistence

Low-Risk

  • HTTP API integration
  • Connection pooling

See: RESEARCH_SUMMARY.md Β§ β€œRisk Assessment & Mitigation”


Success Metrics

Phase 1 (PostgreSQL + MySQL)

  • psycopg2, asyncpg, PyMySQL, mysql-connector-python pass integration tests
  • TLS ALPN detection works
  • Protocol autodetection works
  • Parameter translation works (%s ↔ $1 ↔ ? ↔ :name)
  • Performance: <10ms latency overhead

Phase 1.5 (HTTP APIs)

  • All HTTP API clients can execute queries
  • JWT/OAuth/API key authentication works
  • Vector SQL β†’ Pinecone API translation works

Phase 2 (SQL Server + DB2)

  • pyodbc and ibm_db can connect and run basic queries

Phase 3 (Oracle)

  • oracledb Thin mode can connect and run β€œSELECT 1 FROM DUAL”

Document Statistics

DocumentSizeSectionsFocus
protocol_specifications_summary.md38 KB9Protocol specs, autodetection, implementation checklists
authentication_flows.md39 KB5Auth flows, step-by-step diagrams, security details
packet_formats.md47 KB7Wire-level packet formats, byte layouts, magic bytes
python_client_compatibility.md32 KB7Client analysis, parameter styles, testing strategy
RESEARCH_SUMMARY.md26 KB10Executive summary, roadmap, coordination notes
Total182 KB38Comprehensive protocol research

Quick Reference Tables

Protocol Detection Magic Bytes

PostgreSQL: 00 00 00 XX [Length] + 00 03 00 00 [Protocol 3.0]
MySQL: Server sends 0x0a (HandshakeV10) after sequence byte
TDS: 12 [Type: PRELOGIN]
DRDA: XX XX [Length] + D0 [Magic] + 10 41 [EXCSAT]
Oracle TNS: XX XX [Length] + 00 00 [Checksum] + 01 [Type: CONNECT]
HTTP: 47 45 54 20 [GET ] or 50 4f 53 54 [POST]
TLS: 16 [Handshake] β†’ check ALPN extension

Parameter Style Quick Reference

psycopg2: %s, %(name)s
asyncpg: $1, $2, ...
PyMySQL: %s
mysql-connector (prep): %s, ?
pyodbc: ?
ibm_db: ?
oracledb: :name, :1
snowflake: ?, %(name)s
databricks: :name

Default Ports

PostgreSQL: 5432
MySQL: 3306
SQL Server: 1433
DB2: 50000
Oracle: 1521
HTTPS: 443 (Snowflake, Databricks, Pinecone)

Next Actions

  1. Architect Agent: Review research, design protocol router and unified abstractions
  2. Coder Agent: Implement PostgreSQL handler (proof-of-concept for Phase 1)
  3. Testing: Set up CI/CD pipeline with integration test matrix
  4. Documentation: Create user-facing DSN examples and migration guides

Research Complete All Documents Available: See file list above Ready for Implementation: Phase 1 (PostgreSQL + MySQL)


Research conducted by HeliosDB Researcher Agent Date: 2025-10-10 Status: Complete