Python Client Driver Compatibility Matrix
Python Client Driver Compatibility Matrix
HeliosDB Multi-Protocol Support Research
Researcher Agent Analysis Date: 2025-10-10
Table of Contents
- PostgreSQL Clients
- MySQL Clients
- SQL Server Clients
- DB2 Clients
- Oracle Clients
- HTTP API Clients
- Implementation Recommendations
1. PostgreSQL Clients
psycopg2 (Synchronous)
Type: C extension (libpq wrapper)
Protocol: PostgreSQL wire protocol
Install: pip install psycopg2-binary
Connection Parameters
import psycopg2
conn = psycopg2.connect( host="db.example.com", port=5432, dbname="demo", user="demo", password="demo", sslmode="require", # disable, allow, prefer, require, verify-ca, verify-full connect_timeout=10, application_name="my_app")Parameter Binding
# Format style (%s)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Named parameters (pyformat style)cursor.execute( "SELECT * FROM users WHERE id = %(id)s AND name = %(name)s", {"id": user_id, "name": user_name})Prepared Statements
# Server-side cursor (for large result sets)cursor = conn.cursor(name='fetch_cursor')cursor.execute("SELECT * FROM large_table")
# Iterate without loading all rowsfor record in cursor: process(record)
cursor.close()Connection Pooling
from psycopg2 import pool
# Simple poolconnection_pool = pool.SimpleConnectionPool( minconn=1, maxconn=10, host="db.example.com", user="demo", password="demo", database="demo")
conn = connection_pool.getconn()# Use connectionconnection_pool.putconn(conn)Key Behaviors
- Type Conversion: Automatic Python ↔ PostgreSQL type mapping
- Transactions: Auto-BEGIN on first query, manual COMMIT/ROLLBACK
- COPY Protocol: Full support for COPY FROM/TO
- Two-Phase Commit: Supported via
.tpc_*()methods - LISTEN/NOTIFY: Async notifications supported
- Connection Status:
.closed,.statusattributes
Important Notes
- Uses libpq C library (requires PostgreSQL client libs)
- Blocking I/O (not suitable for async/await)
- Thread-safety: Connections and cursors not thread-safe
- Parameter style: MUST use
%s(even for non-string types)
asyncpg (Asynchronous)
Type: Pure Python + Cython (no libpq)
Protocol: PostgreSQL wire protocol (native implementation)
Install: pip install asyncpg
Connection Parameters
import asyncpg
conn = await asyncpg.connect( host="db.example.com", port=5432, database="demo", user="demo", password="demo", ssl=True, # or ssl.SSLContext object timeout=10, server_settings={ 'application_name': 'my_app' })Parameter Binding
# PostgreSQL native style ($1, $2, ...)result = await conn.fetch( "SELECT * FROM users WHERE id = $1 AND name = $2", user_id, user_name)
# Named parameters NOT supported (must use positional)Prepared Statements
# Automatic caching for fetch(), fetchrow(), fetchval()result = await conn.fetch("SELECT * FROM users WHERE id = $1", user_id)# Statement is automatically prepared and cached
# Explicit prepared statementstmt = await conn.prepare("SELECT * FROM users WHERE id = $1")result = await stmt.fetch(user_id)# Can be reusedresult2 = await stmt.fetch(another_id)Connection Pooling
# Built-in poolpool = await asyncpg.create_pool( host="db.example.com", port=5432, user="demo", password="demo", database="demo", min_size=10, max_size=20, command_timeout=60)
async with pool.acquire() as conn: result = await conn.fetch("SELECT 1")Key Behaviors
- Type Conversion: Custom codec system (fast binary protocol)
- Prepared Statements: LRU cache (max 100 by default)
- Transactions: Explicit via
async with conn.transaction() - Cursors:
async foriteration with.cursor() - COPY Protocol: Full support (async)
- LISTEN/NOTIFY: Supported via
.add_listener()
Important Notes
- HIGH PERFORMANCE: 3-5x faster than psycopg2 in async workloads
- Parameter style: MUST use
$1, $2, ...(PostgreSQL native) - PgBouncer incompatibility: Prepared statements fail in transaction/statement pooling mode
- Cursors/prepared statements invalid after connection release
- Binary protocol by default (not text)
- No thread support (asyncio only)
SQLAlchemy (ORM/Core)
Type: Database abstraction layer
Protocol: Uses psycopg2 or asyncpg under the hood
Install: pip install sqlalchemy psycopg2-binary or pip install sqlalchemy asyncpg
Connection Strings
# psycopg2 (sync)engine = create_engine( "postgresql+psycopg2://demo:demo@db.example.com:5432/demo")
# asyncpg (async)engine = create_async_engine( "postgresql+asyncpg://demo:demo@db.example.com:5432/demo")Parameter Binding (Core)
from sqlalchemy import text
# Named parameters (SQLAlchemy style)result = conn.execute( text("SELECT * FROM users WHERE id = :id"), {"id": user_id})
# Bound parameters (converted to driver's style)stmt = text("SELECT * FROM users WHERE id = :id AND name = :name")result = conn.execute(stmt, {"id": 1, "name": "Alice"})Key Behaviors
- Dialect System: Abstracts driver differences
- Parameter Conversion:
:name→%s(psycopg2) or$1(asyncpg) - Connection Pooling: Built-in QueuePool
- Transaction Management: Automatic or explicit
- Type System: Rich type mapping
Important Notes
- Adds abstraction layer over psycopg2/asyncpg
- HeliosDB must support BOTH psycopg2 and asyncpg behaviors
- ORM layer expects PostgreSQL-specific features (RETURNING, etc.)
2. MySQL Clients
PyMySQL (Synchronous)
Type: Pure Python
Protocol: MySQL wire protocol (Python implementation)
Install: pip install pymysql
Connection Parameters
import pymysql
conn = pymysql.connect( host="db.example.com", port=3306, user="demo", password="demo", database="demo", charset="utf8mb4", ssl={'ssl': {}}, # Enable SSL/TLS connect_timeout=10, autocommit=False)Parameter Binding
# Format style (%s only, no named parameters)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Multiple parameterscursor.execute( "SELECT * FROM users WHERE id = %s AND name = %s", (user_id, user_name))
# CLIENT-SIDE escaping (not true prepared statements)Cursor Types
# Default cursor (tuple rows)cursor = conn.cursor()
# Dictionary cursor (row as dict)cursor = conn.cursor(pymysql.cursors.DictCursor)
# Server-side cursor (streaming)cursor = conn.cursor(pymysql.cursors.SSCursor)Key Behaviors
- Type Conversion: Automatic (int, str, datetime, etc.)
- Prepared Statements: CLIENT-SIDE only (escaping, not protocol)
- Transactions: Manual BEGIN, COMMIT, ROLLBACK
- Autocommit: Can be enabled (affects transaction behavior)
- Character Sets: utf8mb4 recommended for full Unicode
Important Notes
- Pure Python (no C dependencies)
- NOT true prepared statements (client-side escaping only)
- Parameter style:
%sonly (no named, no?) - Thread-safe at connection level (not cursor level)
- MySQL-specific:
cursor.lastrowidfor INSERT
mysql-connector-python (Synchronous)
Type: Pure Python OR C extension
Protocol: MySQL wire protocol
Install: pip install mysql-connector-python
Connection Parameters
import mysql.connector
conn = mysql.connector.connect( host="db.example.com", port=3306, user="demo", password="demo", database="demo", ssl_ca="/path/to/ca.pem", # TLS cert ssl_verify_cert=True, use_pure=False, # False = C extension, True = pure Python autocommit=False)Parameter Binding (Regular Cursor)
# Format style (%s)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Named parameters (pyformat style)cursor.execute( "SELECT * FROM users WHERE id = %(id)s AND name = %(name)s", {"id": user_id, "name": user_name})Prepared Statements (MySQLCursorPrepared)
# Server-side prepared statementcursor = conn.cursor(prepared=True)
# ONLY %s or ? allowed (no named parameters for prepared)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Statement is prepared on first execute, cached for subsequent callscursor.execute("SELECT * FROM users WHERE id = %s", (another_id,))# Uses cached prepared statement
# Qmark style also supportedcursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))Connection Pooling
from mysql.connector import pooling
# Create poolpool = pooling.MySQLConnectionPool( pool_name="my_pool", pool_size=5, host="db.example.com", user="demo", password="demo", database="demo")
# Get connection from poolconn = pool.get_connection()# Use connectionconn.close() # Returns to poolKey Behaviors
- Type Conversion: Automatic with type hints
- Prepared Statements: TRUE server-side (binary protocol)
- Statement Caching: Automatic for prepared cursors
- Transactions: Explicit or autocommit
- Character Sets: Full support for utf8mb4
- Multi-Statements: Supported with
multi=True
Important Notes
- Official Oracle driver
- Server-side prepared statements via
cursor(prepared=True) - Prepared cursors:
%sor?ONLY (no named parameters) - Regular cursors:
%sor%(name)s(named supported) - use_pure=False (C extension) is faster
- Binary protocol for prepared statements
SQLAlchemy (MySQL Dialect)
Type: Database abstraction layer
Protocol: Uses PyMySQL or mysql-connector-python
Install: pip install sqlalchemy pymysql
Connection Strings
# PyMySQLengine = create_engine( "mysql+pymysql://demo:demo@db.example.com:3306/demo")
# mysql-connector-pythonengine = create_engine( "mysql+mysqlconnector://demo:demo@db.example.com:3306/demo")Key Behaviors
- Dialect Translation: Handles MySQL-specific SQL
- LIMIT/OFFSET: Native MySQL syntax
- AUTO_INCREMENT: Supports LAST_INSERT_ID()
- Type Mapping: MySQL-specific types (TINYINT, MEDIUMINT, etc.)
3. SQL Server Clients
pyodbc (ODBC Wrapper)
Type: C extension (ODBC wrapper)
Protocol: TDS via ODBC driver
Install: pip install pyodbc (requires ODBC Driver 18 for SQL Server)
Connection Parameters
import pyodbc
conn = pyodbc.connect( "DRIVER={ODBC Driver 18 for SQL Server};" "SERVER=db.example.com,1433;" "DATABASE=demo;" "UID=demo;" "PWD=demo;" "Encrypt=yes;" "TrustServerCertificate=no;" "Connection Timeout=30;")Parameter Binding
# Qmark style (? only)cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
# Multiple parameterscursor.execute( "SELECT * FROM users WHERE id = ? AND name = ?", (user_id, user_name))
# NO named parameters supportedPrepared Statements
# Prepared automatically by ODBC drivercursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
# Reusing with different parameterscursor.execute("SELECT * FROM users WHERE id = ?", (another_id,))# Driver may cache prepared statementKey Behaviors
- Type Conversion: ODBC driver handles conversions
- Transactions: Auto-BEGIN, manual COMMIT/ROLLBACK
- Encoding: UTF-16LE for SQL Server (ODBC handles conversion)
- MARS: Multiple Active Result Sets (if enabled)
- OUTPUT Parameters: Supported for stored procedures
Important Notes
- Requires ODBC driver installation
- Works with multiple databases (not just SQL Server)
- Parameter style:
?ONLY (qmark) - Fast execution for parameterized queries
- Connection string can be complex (many options)
- Supports async execution (with pyodbc 4.0+)
pymssql (FreeTDS Wrapper)
Type: C extension (FreeTDS wrapper)
Protocol: TDS via FreeTDS
Install: pip install pymssql
Connection Parameters
import pymssql
conn = pymssql.connect( server="db.example.com", port=1433, user="demo", password="demo", database="demo", tds_version="7.4", timeout=30)Parameter Binding
# Format style (%s)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Named parameters (pyformat style)cursor.execute( "SELECT * FROM users WHERE id = %(id)s AND name = %(name)s", {"id": user_id, "name": user_name})Key Behaviors
- Type Conversion: Automatic
- Transactions: Manual control
- Prepared Statements: Via FreeTDS
- Stored Procedures: Full support
Important Notes
- Uses FreeTDS library (not official Microsoft driver)
- Simpler API than pyodbc
- Parameter style:
%sor%(name)s(like psycopg2) - TDS version configurable (7.0, 7.1, 7.2, 7.3, 7.4)
4. DB2 Clients
ibm_db (IBM Data Server Driver)
Type: C extension (IBM driver wrapper)
Protocol: DRDA via IBM Data Server Driver
Install: pip install ibm_db (requires IBM Data Server Driver)
Connection Parameters
import ibm_db
# Connection stringconn = ibm_db.connect( "DATABASE=SAMPLE;" "HOSTNAME=db.example.com;" "PORT=50000;" "PROTOCOL=TCPIP;" "UID=demo;" "PWD=demo;" "SECURITY=SSL;", "", "")SQL Execution
# Immediate execution (non-prepared)stmt = ibm_db.exec_immediate(conn, "SELECT 1 FROM sysibm.sysdummy1")row = ibm_db.fetch_tuple(stmt)print(row)
# Prepared statementstmt = ibm_db.prepare(conn, "SELECT * FROM users WHERE id = ?")ibm_db.bind_param(stmt, 1, user_id)ibm_db.execute(stmt)row = ibm_db.fetch_tuple(stmt)Parameter Binding
# Qmark style (?)stmt = ibm_db.prepare(conn, "SELECT * FROM users WHERE id = ?")ibm_db.bind_param(stmt, 1, user_id) # 1-indexedibm_db.execute(stmt)
# Multiple parametersstmt = ibm_db.prepare(conn, "SELECT * FROM users WHERE id = ? AND name = ?")ibm_db.bind_param(stmt, 1, user_id)ibm_db.bind_param(stmt, 2, user_name)ibm_db.execute(stmt)Key Behaviors
- Type Conversion: Automatic (int, str, datetime, etc.)
- Transactions: Manual COMMIT/ROLLBACK
- Prepared Statements: Via
prepare()+bind_param()+execute() - Connection Test: “SELECT 1 FROM sysibm.sysdummy1”
- SSL/TLS: Supported via SECURITY=SSL
Important Notes
- Requires IBM Data Server Driver installation (large download)
- Two APIs: ibm_db (native) and ibm_db_dbi (DB-API 2.0 compliant)
- Parameter style:
?(qmark) - Connection string format specific to IBM
- DRDA protocol under the hood
- EBCDIC handling (driver converts to/from)
ibm_db_dbi (DB-API 2.0)
import ibm_db_dbi as dbi
# More Pythonic interfaceconn = dbi.connect( "DATABASE=SAMPLE;HOSTNAME=db.example.com;PORT=50000;" "PROTOCOL=TCPIP;UID=demo;PWD=demo;SECURITY=SSL;")
cursor = conn.cursor()cursor.execute("SELECT 1 FROM sysibm.sysdummy1")row = cursor.fetchone()5. Oracle Clients
oracledb (Thin Mode)
Type: Pure Python (Thin mode) OR C extension (Thick mode)
Protocol: Oracle Net/TTC (Thin: Python implementation)
Install: pip install oracledb
Connection Parameters (Thin Mode)
import oracledb
# Thin mode (no Oracle Client required)conn = oracledb.connect( user="demo", password="demo", dsn="db.example.com:1521/DEMO", # host:port/service_name mode=oracledb.THIN_MODE, # Or omit (Thin is default) config_dir=None # No config needed for Thin)Connection Parameters (Thick Mode)
import oracledb
# Initialize Thick mode (requires Oracle Instant Client)oracledb.init_oracle_client(lib_dir="/path/to/instantclient")
conn = oracledb.connect( user="demo", password="demo", dsn="db.example.com:1521/DEMO" # Thick mode auto-detected if init_oracle_client() called)Parameter Binding
# Named parameters (:name)cursor.execute( "SELECT * FROM users WHERE id = :id AND name = :name", {"id": user_id, "name": user_name})
# Numeric parameters (:1, :2)cursor.execute( "SELECT * FROM users WHERE id = :1 AND name = :2", [user_id, user_name])
# Bind variables (explicitly)cursor.execute( "SELECT * FROM users WHERE id = :id", id=user_id)Prepared Statements
# Automatic preparation and cachingcursor.execute("SELECT * FROM users WHERE id = :1", [user_id])
# Statement is prepared on first execute, cached for reusecursor.execute("SELECT * FROM users WHERE id = :1", [another_id])Connection Pooling
# Built-in connection poolpool = oracledb.create_pool( user="demo", password="demo", dsn="db.example.com:1521/DEMO", min=2, max=10, increment=1)
conn = pool.acquire()# Use connectionpool.release(conn)Key Behaviors
- Type Conversion: Rich type system (NUMBER, VARCHAR2, CLOB, BLOB, etc.)
- Prepared Statements: Automatic caching
- Transactions: Auto-BEGIN, manual COMMIT/ROLLBACK
- Connection Test: “SELECT 1 FROM DUAL”
- LOBs: Full support for CLOB, BLOB, NCLOB
- Object Types: User-defined types supported (Thick mode)
- Collections: VARRAY, nested tables (Thick mode)
Important Notes
- Thin mode: No Oracle Client installation required (pure Python)
- Thick mode: Requires Oracle Instant Client (full features)
- Parameter style:
:name(named) or:1, :2(numeric) - “SELECT 1 FROM DUAL” for connection testing
- Bind variables use
:prefix (Oracle convention) - Thin mode limitations: No advanced features (external auth, LDAP, etc.)
6. HTTP API Clients
snowflake-connector-python
Type: Pure Python
Protocol: HTTPS/REST + Arrow (optional)
Install: pip install snowflake-connector-python
Connection Parameters
import snowflake.connector
conn = snowflake.connector.connect( account="local", user="demo", password="demo", database="demo", schema="public", warehouse="compute_wh", role="accountadmin", # Alternative: JWT auth # authenticator="SNOWFLAKE_JWT", # private_key_file="/path/to/key.pem")Query Execution
cursor = conn.cursor()
# Simple querycursor.execute("SELECT 1")print(cursor.fetchone())
# Parameterized query (qmark or named)cursor.execute( "SELECT * FROM users WHERE id = ? AND name = ?", (user_id, user_name))
# Named parameterscursor.execute( "SELECT * FROM users WHERE id = %(id)s AND name = %(name)s", {"id": user_id, "name": user_name})Key Behaviors
- Authentication: Password, JWT, OAuth, SSO
- Result Formats: JSON, Arrow (high performance)
- Session Management: Multi-statement, session parameters
- Async Execution: Via
cursor.execute_async() - Result Caching: Query result caching
- Compression: Gzip for data transfer
Important Notes
- HTTPS/REST based (not wire protocol)
- JWT authentication for automation
- Arrow format for large result sets
- Session parameters (warehouse, schema, etc.) per connection
- Multi-statement execution supported
- Async query execution (long-running queries)
databricks-sql-connector
Type: Thrift-based (no ODBC/JDBC)
Protocol: HTTP/Thrift over TLS
Install: pip install databricks-sql-connector
Connection Parameters
from databricks import sql
conn = sql.connect( server_hostname="db.example.com", http_path="/sql/1.0/warehouses/abc123", access_token="dapi...", # Personal Access Token # Or OAuth: # auth_type="oauth", # client_id="...", # client_secret="...")Query Execution
cursor = conn.cursor()
# Simple querycursor.execute("SELECT 1")print(cursor.fetchone())
# Parameterized query (named parameters)cursor.execute( "SELECT * FROM users WHERE id = :id AND name = :name", {"id": user_id, "name": user_name})
# Parameters as dictparams = {"id": 123, "name": "Alice"}cursor.execute("SELECT * FROM users WHERE id = :id", params)Key Behaviors
- Authentication: PAT, OAuth, Microsoft Entra ID
- Result Formats: JSON, Arrow
- Async Execution: Built-in (via Thrift)
- Result Chunking: Large results paginated
- Unity Catalog: Full support (catalog.schema.table)
- Warehouse Management: SQL warehouse routing
Important Notes
- Thrift-based protocol (not ODBC/JDBC)
- PAT or OAuth authentication
- Named parameters only (
:namestyle) - Unity Catalog namespace support
- Arrow format for performance
- HTTP/2 support
pinecone-client
Type: Pure Python
Protocol: HTTP/REST or gRPC
Install: pip install pinecone-client or pip install pinecone-client[grpc]
Initialization
from pinecone import Pinecone
# HTTP/REST (default)pc = Pinecone(api_key="your_api_key")
# gRPC (for performance)pc = Pinecone(api_key="your_api_key", use_grpc=True)
# Index operationsindex = pc.Index("my-index")Vector Operations
# Upsert vectorsindex.upsert(vectors=[ { "id": "vec1", "values": [0.1, 0.2, 0.3, ..., 0.768], "metadata": {"category": "books", "title": "Example"} }])
# Query vectorsresults = index.query( vector=[0.1, 0.2, 0.3, ..., 0.768], top_k=10, filter={"category": {"$eq": "books"}}, include_values=True, include_metadata=True)
# Delete vectorsindex.delete(ids=["vec1", "vec2"])
# Fetch vectorsvectors = index.fetch(ids=["vec1", "vec2"])
# Update metadataindex.update(id="vec1", set_metadata={"updated": True})Key Behaviors
- API Authentication: API key header
- Namespaces: Logical isolation within index
- Metadata Filtering: Complex boolean expressions
- Sparse Vectors: Hybrid dense + sparse support
- gRPC: Optional for performance
- Up-to-the-second: Immediate consistency
Important Notes
- Not SQL-based (vector operations)
- HTTP/REST or gRPC protocols
- API key authentication
- Metadata filtering with MongoDB-like syntax
- gRPC provides modest performance improvement
- No connection pooling (stateless HTTP)
7. Implementation Recommendations
Unified Parameter Handling
Challenge: Different parameter styles across drivers
| Driver | Parameter Style | Example |
|---|---|---|
| psycopg2 | %s, %(name)s | WHERE id = %s |
| asyncpg | $1, $2, ... | WHERE id = $1 |
| PyMySQL | %s only | WHERE id = %s |
| mysql-connector (prep) | %s or ? | WHERE id = %s |
| pyodbc | ? only | WHERE id = ? |
| ibm_db | ? only | WHERE id = ? |
| oracledb | :name or :1 | WHERE id = :id |
| snowflake-connector | ? or %(name)s | WHERE id = ? |
| databricks-sql | :name | WHERE id = :name |
Recommendation:
class ParameterAdapter: def adapt(self, sql, params, target_style): if target_style == "postgres_positional": # Convert to $1, $2, ... return self._to_postgres_positional(sql, params) elif target_style == "qmark": # Convert to ? return self._to_qmark(sql, params) elif target_style == "named": # Convert to :name return self._to_named(sql, params) # etc.Prepared Statement Handling
Challenge: Client-side vs server-side prepared statements
PyMySQL: Client-side escaping (not true prepared statements)
# PyMySQL: String escaping, not binary protocolcursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))# HeliosDB: Can optimize with server-side cachemysql-connector-python: Server-side prepared (binary protocol)
cursor = conn.cursor(prepared=True)cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))# HeliosDB: MUST support COM_STMT_PREPARE, COM_STMT_EXECUTEasyncpg: Automatic caching
await conn.fetch("SELECT * FROM users WHERE id = $1", user_id)# HeliosDB: LRU cache for Parse messagesRecommendation:
- Implement prepared statement cache per protocol
- For PyMySQL: Accept text protocol, optionally cache internally
- For mysql-connector-python: Support binary protocol (COM_STMT_*)
- For asyncpg: Cache Parse/Bind messages (max 100 by default)
Connection Pooling
Challenge: Some clients have built-in pools, others don’t
Built-in Pools:
- asyncpg:
create_pool() - mysql-connector-python:
pooling.MySQLConnectionPool() - oracledb:
create_pool()
External Pools:
- psycopg2: Use
psycopg2.poolor SQLAlchemy - PyMySQL: Use SQLAlchemy or DBUtils
- pyodbc: Use SQLAlchemy
Recommendation:
- HeliosDB should implement server-side connection pooling
- Track active connections per protocol
- Support concurrent connections (asyncio-safe)
Transaction Semantics
Challenge: Different transaction modes
Auto-BEGIN:
- psycopg2: First query starts transaction
- oracledb: First query starts transaction
Manual BEGIN:
- MySQL: Requires explicit
BEGINorSTART TRANSACTION - SQL Server: Requires explicit
BEGIN TRANSACTION
Autocommit:
- MySQL: Can be enabled (no explicit transactions)
- SQL Server: Implicit transactions by default
Recommendation:
class TransactionManager: def __init__(self, protocol): self.protocol = protocol self.in_transaction = False
def handle_query(self, sql): if self.protocol == "postgresql": if not self.in_transaction and not sql.upper().startswith("BEGIN"): # Auto-BEGIN self.execute_internal("BEGIN") self.in_transaction = True elif self.protocol == "mysql": # Explicit BEGIN required pass # Execute queryType Conversion
Challenge: Different type systems
PostgreSQL:
- Numeric: int2, int4, int8, numeric, float4, float8
- Text: text, varchar, char
- Binary: bytea
- Temporal: timestamp, timestamptz, date, time
- JSON: json, jsonb
MySQL:
- Numeric: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
- Text: VARCHAR, CHAR, TEXT
- Binary: BLOB, BINARY, VARBINARY
- Temporal: DATETIME, TIMESTAMP, DATE, TIME
SQL Server:
- Numeric: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, REAL
- Text: NVARCHAR, VARCHAR, NCHAR, CHAR
- Binary: VARBINARY, BINARY
- Temporal: DATETIME, DATETIME2, DATE, TIME
Recommendation:
- Unified internal type system
- Dialect-specific mappers (PostgreSQL types ↔ MySQL types ↔ internal)
- Vector type exposure (BYTEA for PostgreSQL, VARBINARY for MySQL)
Error Handling
Challenge: Different error codes and formats
PostgreSQL: SQLSTATE (5-char code) + message
SQLSTATE: 42P01Message: relation "users" does not existMySQL: Error number + SQLSTATE + message
Error: 1146SQLSTATE: 42S02Message: Table 'demo.users' doesn't existSQL Server: Error number + severity + state + message
Error: 208Severity: 16State: 1Message: Invalid object name 'users'.Recommendation:
class ErrorMapper: def map_error(self, internal_error, target_protocol): if target_protocol == "postgresql": return PostgreSQLError( sqlstate=self._to_sqlstate(internal_error), message=internal_error.message ) elif target_protocol == "mysql": return MySQLError( errno=self._to_mysql_errno(internal_error), sqlstate=self._to_sqlstate(internal_error), message=internal_error.message ) # etc.Async Support
Challenge: Async vs sync drivers
Async Drivers:
- asyncpg (asyncio native)
- databricks-sql-connector (Thrift async)
Sync Drivers:
- psycopg2, PyMySQL, mysql-connector-python, pyodbc, ibm_db
Recommendation:
- Implement async protocol handlers with asyncio
- Use thread pool for blocking operations if needed
- Support concurrent connections per protocol
Testing Matrix
Required Tests per Driver:
-
Connection Establishment
- Valid credentials
- Invalid credentials
- TLS/SSL negotiation
- Timeout handling
-
Authentication
- All supported methods (password, JWT, OAuth, etc.)
- Auth failure scenarios
-
Query Execution
- Simple query (SELECT 1)
- Parameterized query
- Multiple result sets
- Empty result set
-
Prepared Statements
- Prepare + execute
- Reuse with different parameters
- Close prepared statement
-
Transactions
- BEGIN, COMMIT, ROLLBACK
- Auto-commit mode
- Transaction isolation
-
Data Types
- Integer, float, decimal
- Text, varchar
- Binary, bytea/blob
- Timestamp, date, time
- NULL handling
-
Error Handling
- Invalid SQL
- Constraint violation
- Connection loss
- Timeout
-
Cursors (where applicable)
- Server-side cursor
- Fetch rows
- Cursor close
-
Connection Pooling
- Acquire/release
- Max connections
- Connection reuse
-
Vector Operations (Pinecone)
- Upsert, query, delete
- Metadata filtering
- Namespaces
CI/CD Integration:
test-matrix: - driver: psycopg2 version: 2.9.9 protocol: postgresql - driver: asyncpg version: 0.29.0 protocol: postgresql - driver: PyMySQL version: 1.1.0 protocol: mysql - driver: mysql-connector-python version: 8.0.33 protocol: mysql # ... etc.Summary & Next Steps
Key Findings
-
Parameter Style Diversity: 5 different styles (
%s,$1,?,:name,%(name)s)- Requires unified parameter adapter
-
Prepared Statement Complexity: Client-side vs server-side
- PyMySQL: Client-side escaping
- mysql-connector-python, asyncpg, oracledb: Server-side caching
- HeliosDB must support both paradigms
-
Connection Pooling: Mixed built-in and external
- Server-side pooling recommended for consistency
-
Transaction Semantics: Auto-BEGIN vs manual BEGIN
- Protocol-specific transaction managers needed
-
Type Systems: Rich but incompatible
- Unified internal type system with dialect mappers
-
Async/Sync Split: asyncpg vs psycopg2
- Async handlers with thread pool fallback
Implementation Priorities
Phase 1 (PostgreSQL + MySQL):
- Implement psycopg2 compatibility (
%sparameters) - Implement asyncpg compatibility (
$1parameters) - Implement PyMySQL compatibility (client-side escaping)
- Implement mysql-connector-python compatibility (server-side prepared)
- Unified parameter adapter
- Server-side connection pooling
Phase 1.5 (HTTP APIs):
- Implement snowflake-connector-python compatibility (JWT auth)
- Implement databricks-sql-connector compatibility (Thrift)
- Implement pinecone-client compatibility (vector ops → SQL translation)
Phase 2 (SQL Server + DB2):
- Implement pyodbc compatibility (
?parameters) - Implement ibm_db compatibility (DRDA + EBCDIC handling)
Phase 3 (Oracle):
- Implement oracledb compatibility (
:nameparameters) - Thin mode focus (pure Python reference)
Testing Strategy
Per-Driver Integration Tests:
- Connection → Auth → Query → Prepared Statement → Transaction → Cursor
- Run against each Python client library
- Validate parameter binding, type conversion, error mapping
Compatibility Matrix:
- Track feature support per driver
- Mark unsupported features clearly in docs
- Provide migration guides (expected differences)
Document Status: Complete Coverage: All target Python clients with behavioral analysis Next Steps: Coordinate with architect on unified parameter/type/error handling design