Skip to content

Python Client Driver Compatibility Matrix

Python Client Driver Compatibility Matrix

HeliosDB Multi-Protocol Support Research

Researcher Agent Analysis Date: 2025-10-10


Table of Contents

  1. PostgreSQL Clients
  2. MySQL Clients
  3. SQL Server Clients
  4. DB2 Clients
  5. Oracle Clients
  6. HTTP API Clients
  7. Implementation Recommendations

1. PostgreSQL Clients

psycopg2 (Synchronous)

Type: C extension (libpq wrapper) Protocol: PostgreSQL wire protocol Install: pip install psycopg2-binary

Connection Parameters

import psycopg2
conn = psycopg2.connect(
host="db.example.com",
port=5432,
dbname="demo",
user="demo",
password="demo",
sslmode="require", # disable, allow, prefer, require, verify-ca, verify-full
connect_timeout=10,
application_name="my_app"
)

Parameter Binding

# Format style (%s)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Named parameters (pyformat style)
cursor.execute(
"SELECT * FROM users WHERE id = %(id)s AND name = %(name)s",
{"id": user_id, "name": user_name}
)

Prepared Statements

# Server-side cursor (for large result sets)
cursor = conn.cursor(name='fetch_cursor')
cursor.execute("SELECT * FROM large_table")
# Iterate without loading all rows
for record in cursor:
process(record)
cursor.close()

Connection Pooling

from psycopg2 import pool
# Simple pool
connection_pool = pool.SimpleConnectionPool(
minconn=1,
maxconn=10,
host="db.example.com",
user="demo",
password="demo",
database="demo"
)
conn = connection_pool.getconn()
# Use connection
connection_pool.putconn(conn)

Key Behaviors

  • Type Conversion: Automatic Python ↔ PostgreSQL type mapping
  • Transactions: Auto-BEGIN on first query, manual COMMIT/ROLLBACK
  • COPY Protocol: Full support for COPY FROM/TO
  • Two-Phase Commit: Supported via .tpc_*() methods
  • LISTEN/NOTIFY: Async notifications supported
  • Connection Status: .closed, .status attributes

Important Notes

  • Uses libpq C library (requires PostgreSQL client libs)
  • Blocking I/O (not suitable for async/await)
  • Thread-safety: Connections and cursors not thread-safe
  • Parameter style: MUST use %s (even for non-string types)

asyncpg (Asynchronous)

Type: Pure Python + Cython (no libpq) Protocol: PostgreSQL wire protocol (native implementation) Install: pip install asyncpg

Connection Parameters

import asyncpg
conn = await asyncpg.connect(
host="db.example.com",
port=5432,
database="demo",
user="demo",
password="demo",
ssl=True, # or ssl.SSLContext object
timeout=10,
server_settings={
'application_name': 'my_app'
}
)

Parameter Binding

# PostgreSQL native style ($1, $2, ...)
result = await conn.fetch(
"SELECT * FROM users WHERE id = $1 AND name = $2",
user_id, user_name
)
# Named parameters NOT supported (must use positional)

Prepared Statements

# Automatic caching for fetch(), fetchrow(), fetchval()
result = await conn.fetch("SELECT * FROM users WHERE id = $1", user_id)
# Statement is automatically prepared and cached
# Explicit prepared statement
stmt = await conn.prepare("SELECT * FROM users WHERE id = $1")
result = await stmt.fetch(user_id)
# Can be reused
result2 = await stmt.fetch(another_id)

Connection Pooling

# Built-in pool
pool = await asyncpg.create_pool(
host="db.example.com",
port=5432,
user="demo",
password="demo",
database="demo",
min_size=10,
max_size=20,
command_timeout=60
)
async with pool.acquire() as conn:
result = await conn.fetch("SELECT 1")

Key Behaviors

  • Type Conversion: Custom codec system (fast binary protocol)
  • Prepared Statements: LRU cache (max 100 by default)
  • Transactions: Explicit via async with conn.transaction()
  • Cursors: async for iteration with .cursor()
  • COPY Protocol: Full support (async)
  • LISTEN/NOTIFY: Supported via .add_listener()

Important Notes

  • HIGH PERFORMANCE: 3-5x faster than psycopg2 in async workloads
  • Parameter style: MUST use $1, $2, ... (PostgreSQL native)
  • PgBouncer incompatibility: Prepared statements fail in transaction/statement pooling mode
  • Cursors/prepared statements invalid after connection release
  • Binary protocol by default (not text)
  • No thread support (asyncio only)

SQLAlchemy (ORM/Core)

Type: Database abstraction layer Protocol: Uses psycopg2 or asyncpg under the hood Install: pip install sqlalchemy psycopg2-binary or pip install sqlalchemy asyncpg

Connection Strings

# psycopg2 (sync)
engine = create_engine(
"postgresql+psycopg2://demo:demo@db.example.com:5432/demo"
)
# asyncpg (async)
engine = create_async_engine(
"postgresql+asyncpg://demo:demo@db.example.com:5432/demo"
)

Parameter Binding (Core)

from sqlalchemy import text
# Named parameters (SQLAlchemy style)
result = conn.execute(
text("SELECT * FROM users WHERE id = :id"),
{"id": user_id}
)
# Bound parameters (converted to driver's style)
stmt = text("SELECT * FROM users WHERE id = :id AND name = :name")
result = conn.execute(stmt, {"id": 1, "name": "Alice"})

Key Behaviors

  • Dialect System: Abstracts driver differences
  • Parameter Conversion: :name%s (psycopg2) or $1 (asyncpg)
  • Connection Pooling: Built-in QueuePool
  • Transaction Management: Automatic or explicit
  • Type System: Rich type mapping

Important Notes

  • Adds abstraction layer over psycopg2/asyncpg
  • HeliosDB must support BOTH psycopg2 and asyncpg behaviors
  • ORM layer expects PostgreSQL-specific features (RETURNING, etc.)

2. MySQL Clients

PyMySQL (Synchronous)

Type: Pure Python Protocol: MySQL wire protocol (Python implementation) Install: pip install pymysql

Connection Parameters

import pymysql
conn = pymysql.connect(
host="db.example.com",
port=3306,
user="demo",
password="demo",
database="demo",
charset="utf8mb4",
ssl={'ssl': {}}, # Enable SSL/TLS
connect_timeout=10,
autocommit=False
)

Parameter Binding

# Format style (%s only, no named parameters)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Multiple parameters
cursor.execute(
"SELECT * FROM users WHERE id = %s AND name = %s",
(user_id, user_name)
)
# CLIENT-SIDE escaping (not true prepared statements)

Cursor Types

# Default cursor (tuple rows)
cursor = conn.cursor()
# Dictionary cursor (row as dict)
cursor = conn.cursor(pymysql.cursors.DictCursor)
# Server-side cursor (streaming)
cursor = conn.cursor(pymysql.cursors.SSCursor)

Key Behaviors

  • Type Conversion: Automatic (int, str, datetime, etc.)
  • Prepared Statements: CLIENT-SIDE only (escaping, not protocol)
  • Transactions: Manual BEGIN, COMMIT, ROLLBACK
  • Autocommit: Can be enabled (affects transaction behavior)
  • Character Sets: utf8mb4 recommended for full Unicode

Important Notes

  • Pure Python (no C dependencies)
  • NOT true prepared statements (client-side escaping only)
  • Parameter style: %s only (no named, no ?)
  • Thread-safe at connection level (not cursor level)
  • MySQL-specific: cursor.lastrowid for INSERT

mysql-connector-python (Synchronous)

Type: Pure Python OR C extension Protocol: MySQL wire protocol Install: pip install mysql-connector-python

Connection Parameters

import mysql.connector
conn = mysql.connector.connect(
host="db.example.com",
port=3306,
user="demo",
password="demo",
database="demo",
ssl_ca="/path/to/ca.pem", # TLS cert
ssl_verify_cert=True,
use_pure=False, # False = C extension, True = pure Python
autocommit=False
)

Parameter Binding (Regular Cursor)

# Format style (%s)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Named parameters (pyformat style)
cursor.execute(
"SELECT * FROM users WHERE id = %(id)s AND name = %(name)s",
{"id": user_id, "name": user_name}
)

Prepared Statements (MySQLCursorPrepared)

# Server-side prepared statement
cursor = conn.cursor(prepared=True)
# ONLY %s or ? allowed (no named parameters for prepared)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Statement is prepared on first execute, cached for subsequent calls
cursor.execute("SELECT * FROM users WHERE id = %s", (another_id,))
# Uses cached prepared statement
# Qmark style also supported
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

Connection Pooling

from mysql.connector import pooling
# Create pool
pool = pooling.MySQLConnectionPool(
pool_name="my_pool",
pool_size=5,
host="db.example.com",
user="demo",
password="demo",
database="demo"
)
# Get connection from pool
conn = pool.get_connection()
# Use connection
conn.close() # Returns to pool

Key Behaviors

  • Type Conversion: Automatic with type hints
  • Prepared Statements: TRUE server-side (binary protocol)
  • Statement Caching: Automatic for prepared cursors
  • Transactions: Explicit or autocommit
  • Character Sets: Full support for utf8mb4
  • Multi-Statements: Supported with multi=True

Important Notes

  • Official Oracle driver
  • Server-side prepared statements via cursor(prepared=True)
  • Prepared cursors: %s or ? ONLY (no named parameters)
  • Regular cursors: %s or %(name)s (named supported)
  • use_pure=False (C extension) is faster
  • Binary protocol for prepared statements

SQLAlchemy (MySQL Dialect)

Type: Database abstraction layer Protocol: Uses PyMySQL or mysql-connector-python Install: pip install sqlalchemy pymysql

Connection Strings

# PyMySQL
engine = create_engine(
"mysql+pymysql://demo:demo@db.example.com:3306/demo"
)
# mysql-connector-python
engine = create_engine(
"mysql+mysqlconnector://demo:demo@db.example.com:3306/demo"
)

Key Behaviors

  • Dialect Translation: Handles MySQL-specific SQL
  • LIMIT/OFFSET: Native MySQL syntax
  • AUTO_INCREMENT: Supports LAST_INSERT_ID()
  • Type Mapping: MySQL-specific types (TINYINT, MEDIUMINT, etc.)

3. SQL Server Clients

pyodbc (ODBC Wrapper)

Type: C extension (ODBC wrapper) Protocol: TDS via ODBC driver Install: pip install pyodbc (requires ODBC Driver 18 for SQL Server)

Connection Parameters

import pyodbc
conn = pyodbc.connect(
"DRIVER={ODBC Driver 18 for SQL Server};"
"SERVER=db.example.com,1433;"
"DATABASE=demo;"
"UID=demo;"
"PWD=demo;"
"Encrypt=yes;"
"TrustServerCertificate=no;"
"Connection Timeout=30;"
)

Parameter Binding

# Qmark style (? only)
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
# Multiple parameters
cursor.execute(
"SELECT * FROM users WHERE id = ? AND name = ?",
(user_id, user_name)
)
# NO named parameters supported

Prepared Statements

# Prepared automatically by ODBC driver
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
# Reusing with different parameters
cursor.execute("SELECT * FROM users WHERE id = ?", (another_id,))
# Driver may cache prepared statement

Key Behaviors

  • Type Conversion: ODBC driver handles conversions
  • Transactions: Auto-BEGIN, manual COMMIT/ROLLBACK
  • Encoding: UTF-16LE for SQL Server (ODBC handles conversion)
  • MARS: Multiple Active Result Sets (if enabled)
  • OUTPUT Parameters: Supported for stored procedures

Important Notes

  • Requires ODBC driver installation
  • Works with multiple databases (not just SQL Server)
  • Parameter style: ? ONLY (qmark)
  • Fast execution for parameterized queries
  • Connection string can be complex (many options)
  • Supports async execution (with pyodbc 4.0+)

pymssql (FreeTDS Wrapper)

Type: C extension (FreeTDS wrapper) Protocol: TDS via FreeTDS Install: pip install pymssql

Connection Parameters

import pymssql
conn = pymssql.connect(
server="db.example.com",
port=1433,
user="demo",
password="demo",
database="demo",
tds_version="7.4",
timeout=30
)

Parameter Binding

# Format style (%s)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Named parameters (pyformat style)
cursor.execute(
"SELECT * FROM users WHERE id = %(id)s AND name = %(name)s",
{"id": user_id, "name": user_name}
)

Key Behaviors

  • Type Conversion: Automatic
  • Transactions: Manual control
  • Prepared Statements: Via FreeTDS
  • Stored Procedures: Full support

Important Notes

  • Uses FreeTDS library (not official Microsoft driver)
  • Simpler API than pyodbc
  • Parameter style: %s or %(name)s (like psycopg2)
  • TDS version configurable (7.0, 7.1, 7.2, 7.3, 7.4)

4. DB2 Clients

ibm_db (IBM Data Server Driver)

Type: C extension (IBM driver wrapper) Protocol: DRDA via IBM Data Server Driver Install: pip install ibm_db (requires IBM Data Server Driver)

Connection Parameters

import ibm_db
# Connection string
conn = ibm_db.connect(
"DATABASE=SAMPLE;"
"HOSTNAME=db.example.com;"
"PORT=50000;"
"PROTOCOL=TCPIP;"
"UID=demo;"
"PWD=demo;"
"SECURITY=SSL;",
"", ""
)

SQL Execution

# Immediate execution (non-prepared)
stmt = ibm_db.exec_immediate(conn, "SELECT 1 FROM sysibm.sysdummy1")
row = ibm_db.fetch_tuple(stmt)
print(row)
# Prepared statement
stmt = ibm_db.prepare(conn, "SELECT * FROM users WHERE id = ?")
ibm_db.bind_param(stmt, 1, user_id)
ibm_db.execute(stmt)
row = ibm_db.fetch_tuple(stmt)

Parameter Binding

# Qmark style (?)
stmt = ibm_db.prepare(conn, "SELECT * FROM users WHERE id = ?")
ibm_db.bind_param(stmt, 1, user_id) # 1-indexed
ibm_db.execute(stmt)
# Multiple parameters
stmt = ibm_db.prepare(conn, "SELECT * FROM users WHERE id = ? AND name = ?")
ibm_db.bind_param(stmt, 1, user_id)
ibm_db.bind_param(stmt, 2, user_name)
ibm_db.execute(stmt)

Key Behaviors

  • Type Conversion: Automatic (int, str, datetime, etc.)
  • Transactions: Manual COMMIT/ROLLBACK
  • Prepared Statements: Via prepare() + bind_param() + execute()
  • Connection Test: “SELECT 1 FROM sysibm.sysdummy1”
  • SSL/TLS: Supported via SECURITY=SSL

Important Notes

  • Requires IBM Data Server Driver installation (large download)
  • Two APIs: ibm_db (native) and ibm_db_dbi (DB-API 2.0 compliant)
  • Parameter style: ? (qmark)
  • Connection string format specific to IBM
  • DRDA protocol under the hood
  • EBCDIC handling (driver converts to/from)

ibm_db_dbi (DB-API 2.0)

import ibm_db_dbi as dbi
# More Pythonic interface
conn = dbi.connect(
"DATABASE=SAMPLE;HOSTNAME=db.example.com;PORT=50000;"
"PROTOCOL=TCPIP;UID=demo;PWD=demo;SECURITY=SSL;"
)
cursor = conn.cursor()
cursor.execute("SELECT 1 FROM sysibm.sysdummy1")
row = cursor.fetchone()

5. Oracle Clients

oracledb (Thin Mode)

Type: Pure Python (Thin mode) OR C extension (Thick mode) Protocol: Oracle Net/TTC (Thin: Python implementation) Install: pip install oracledb

Connection Parameters (Thin Mode)

import oracledb
# Thin mode (no Oracle Client required)
conn = oracledb.connect(
user="demo",
password="demo",
dsn="db.example.com:1521/DEMO", # host:port/service_name
mode=oracledb.THIN_MODE, # Or omit (Thin is default)
config_dir=None # No config needed for Thin
)

Connection Parameters (Thick Mode)

import oracledb
# Initialize Thick mode (requires Oracle Instant Client)
oracledb.init_oracle_client(lib_dir="/path/to/instantclient")
conn = oracledb.connect(
user="demo",
password="demo",
dsn="db.example.com:1521/DEMO"
# Thick mode auto-detected if init_oracle_client() called
)

Parameter Binding

# Named parameters (:name)
cursor.execute(
"SELECT * FROM users WHERE id = :id AND name = :name",
{"id": user_id, "name": user_name}
)
# Numeric parameters (:1, :2)
cursor.execute(
"SELECT * FROM users WHERE id = :1 AND name = :2",
[user_id, user_name]
)
# Bind variables (explicitly)
cursor.execute(
"SELECT * FROM users WHERE id = :id",
id=user_id
)

Prepared Statements

# Automatic preparation and caching
cursor.execute("SELECT * FROM users WHERE id = :1", [user_id])
# Statement is prepared on first execute, cached for reuse
cursor.execute("SELECT * FROM users WHERE id = :1", [another_id])

Connection Pooling

# Built-in connection pool
pool = oracledb.create_pool(
user="demo",
password="demo",
dsn="db.example.com:1521/DEMO",
min=2,
max=10,
increment=1
)
conn = pool.acquire()
# Use connection
pool.release(conn)

Key Behaviors

  • Type Conversion: Rich type system (NUMBER, VARCHAR2, CLOB, BLOB, etc.)
  • Prepared Statements: Automatic caching
  • Transactions: Auto-BEGIN, manual COMMIT/ROLLBACK
  • Connection Test: “SELECT 1 FROM DUAL”
  • LOBs: Full support for CLOB, BLOB, NCLOB
  • Object Types: User-defined types supported (Thick mode)
  • Collections: VARRAY, nested tables (Thick mode)

Important Notes

  • Thin mode: No Oracle Client installation required (pure Python)
  • Thick mode: Requires Oracle Instant Client (full features)
  • Parameter style: :name (named) or :1, :2 (numeric)
  • “SELECT 1 FROM DUAL” for connection testing
  • Bind variables use : prefix (Oracle convention)
  • Thin mode limitations: No advanced features (external auth, LDAP, etc.)

6. HTTP API Clients

snowflake-connector-python

Type: Pure Python Protocol: HTTPS/REST + Arrow (optional) Install: pip install snowflake-connector-python

Connection Parameters

import snowflake.connector
conn = snowflake.connector.connect(
account="local",
user="demo",
password="demo",
database="demo",
schema="public",
warehouse="compute_wh",
role="accountadmin",
# Alternative: JWT auth
# authenticator="SNOWFLAKE_JWT",
# private_key_file="/path/to/key.pem"
)

Query Execution

cursor = conn.cursor()
# Simple query
cursor.execute("SELECT 1")
print(cursor.fetchone())
# Parameterized query (qmark or named)
cursor.execute(
"SELECT * FROM users WHERE id = ? AND name = ?",
(user_id, user_name)
)
# Named parameters
cursor.execute(
"SELECT * FROM users WHERE id = %(id)s AND name = %(name)s",
{"id": user_id, "name": user_name}
)

Key Behaviors

  • Authentication: Password, JWT, OAuth, SSO
  • Result Formats: JSON, Arrow (high performance)
  • Session Management: Multi-statement, session parameters
  • Async Execution: Via cursor.execute_async()
  • Result Caching: Query result caching
  • Compression: Gzip for data transfer

Important Notes

  • HTTPS/REST based (not wire protocol)
  • JWT authentication for automation
  • Arrow format for large result sets
  • Session parameters (warehouse, schema, etc.) per connection
  • Multi-statement execution supported
  • Async query execution (long-running queries)

databricks-sql-connector

Type: Thrift-based (no ODBC/JDBC) Protocol: HTTP/Thrift over TLS Install: pip install databricks-sql-connector

Connection Parameters

from databricks import sql
conn = sql.connect(
server_hostname="db.example.com",
http_path="/sql/1.0/warehouses/abc123",
access_token="dapi...", # Personal Access Token
# Or OAuth:
# auth_type="oauth",
# client_id="...",
# client_secret="..."
)

Query Execution

cursor = conn.cursor()
# Simple query
cursor.execute("SELECT 1")
print(cursor.fetchone())
# Parameterized query (named parameters)
cursor.execute(
"SELECT * FROM users WHERE id = :id AND name = :name",
{"id": user_id, "name": user_name}
)
# Parameters as dict
params = {"id": 123, "name": "Alice"}
cursor.execute("SELECT * FROM users WHERE id = :id", params)

Key Behaviors

  • Authentication: PAT, OAuth, Microsoft Entra ID
  • Result Formats: JSON, Arrow
  • Async Execution: Built-in (via Thrift)
  • Result Chunking: Large results paginated
  • Unity Catalog: Full support (catalog.schema.table)
  • Warehouse Management: SQL warehouse routing

Important Notes

  • Thrift-based protocol (not ODBC/JDBC)
  • PAT or OAuth authentication
  • Named parameters only (:name style)
  • Unity Catalog namespace support
  • Arrow format for performance
  • HTTP/2 support

pinecone-client

Type: Pure Python Protocol: HTTP/REST or gRPC Install: pip install pinecone-client or pip install pinecone-client[grpc]

Initialization

from pinecone import Pinecone
# HTTP/REST (default)
pc = Pinecone(api_key="your_api_key")
# gRPC (for performance)
pc = Pinecone(api_key="your_api_key", use_grpc=True)
# Index operations
index = pc.Index("my-index")

Vector Operations

# Upsert vectors
index.upsert(vectors=[
{
"id": "vec1",
"values": [0.1, 0.2, 0.3, ..., 0.768],
"metadata": {"category": "books", "title": "Example"}
}
])
# Query vectors
results = index.query(
vector=[0.1, 0.2, 0.3, ..., 0.768],
top_k=10,
filter={"category": {"$eq": "books"}},
include_values=True,
include_metadata=True
)
# Delete vectors
index.delete(ids=["vec1", "vec2"])
# Fetch vectors
vectors = index.fetch(ids=["vec1", "vec2"])
# Update metadata
index.update(id="vec1", set_metadata={"updated": True})

Key Behaviors

  • API Authentication: API key header
  • Namespaces: Logical isolation within index
  • Metadata Filtering: Complex boolean expressions
  • Sparse Vectors: Hybrid dense + sparse support
  • gRPC: Optional for performance
  • Up-to-the-second: Immediate consistency

Important Notes

  • Not SQL-based (vector operations)
  • HTTP/REST or gRPC protocols
  • API key authentication
  • Metadata filtering with MongoDB-like syntax
  • gRPC provides modest performance improvement
  • No connection pooling (stateless HTTP)

7. Implementation Recommendations

Unified Parameter Handling

Challenge: Different parameter styles across drivers

DriverParameter StyleExample
psycopg2%s, %(name)sWHERE id = %s
asyncpg$1, $2, ...WHERE id = $1
PyMySQL%s onlyWHERE id = %s
mysql-connector (prep)%s or ?WHERE id = %s
pyodbc? onlyWHERE id = ?
ibm_db? onlyWHERE id = ?
oracledb:name or :1WHERE id = :id
snowflake-connector? or %(name)sWHERE id = ?
databricks-sql:nameWHERE id = :name

Recommendation:

class ParameterAdapter:
def adapt(self, sql, params, target_style):
if target_style == "postgres_positional":
# Convert to $1, $2, ...
return self._to_postgres_positional(sql, params)
elif target_style == "qmark":
# Convert to ?
return self._to_qmark(sql, params)
elif target_style == "named":
# Convert to :name
return self._to_named(sql, params)
# etc.

Prepared Statement Handling

Challenge: Client-side vs server-side prepared statements

PyMySQL: Client-side escaping (not true prepared statements)

# PyMySQL: String escaping, not binary protocol
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# HeliosDB: Can optimize with server-side cache

mysql-connector-python: Server-side prepared (binary protocol)

cursor = conn.cursor(prepared=True)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# HeliosDB: MUST support COM_STMT_PREPARE, COM_STMT_EXECUTE

asyncpg: Automatic caching

await conn.fetch("SELECT * FROM users WHERE id = $1", user_id)
# HeliosDB: LRU cache for Parse messages

Recommendation:

  • Implement prepared statement cache per protocol
  • For PyMySQL: Accept text protocol, optionally cache internally
  • For mysql-connector-python: Support binary protocol (COM_STMT_*)
  • For asyncpg: Cache Parse/Bind messages (max 100 by default)

Connection Pooling

Challenge: Some clients have built-in pools, others don’t

Built-in Pools:

  • asyncpg: create_pool()
  • mysql-connector-python: pooling.MySQLConnectionPool()
  • oracledb: create_pool()

External Pools:

  • psycopg2: Use psycopg2.pool or SQLAlchemy
  • PyMySQL: Use SQLAlchemy or DBUtils
  • pyodbc: Use SQLAlchemy

Recommendation:

  • HeliosDB should implement server-side connection pooling
  • Track active connections per protocol
  • Support concurrent connections (asyncio-safe)

Transaction Semantics

Challenge: Different transaction modes

Auto-BEGIN:

  • psycopg2: First query starts transaction
  • oracledb: First query starts transaction

Manual BEGIN:

  • MySQL: Requires explicit BEGIN or START TRANSACTION
  • SQL Server: Requires explicit BEGIN TRANSACTION

Autocommit:

  • MySQL: Can be enabled (no explicit transactions)
  • SQL Server: Implicit transactions by default

Recommendation:

class TransactionManager:
def __init__(self, protocol):
self.protocol = protocol
self.in_transaction = False
def handle_query(self, sql):
if self.protocol == "postgresql":
if not self.in_transaction and not sql.upper().startswith("BEGIN"):
# Auto-BEGIN
self.execute_internal("BEGIN")
self.in_transaction = True
elif self.protocol == "mysql":
# Explicit BEGIN required
pass
# Execute query

Type Conversion

Challenge: Different type systems

PostgreSQL:

  • Numeric: int2, int4, int8, numeric, float4, float8
  • Text: text, varchar, char
  • Binary: bytea
  • Temporal: timestamp, timestamptz, date, time
  • JSON: json, jsonb

MySQL:

  • Numeric: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
  • Text: VARCHAR, CHAR, TEXT
  • Binary: BLOB, BINARY, VARBINARY
  • Temporal: DATETIME, TIMESTAMP, DATE, TIME

SQL Server:

  • Numeric: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, REAL
  • Text: NVARCHAR, VARCHAR, NCHAR, CHAR
  • Binary: VARBINARY, BINARY
  • Temporal: DATETIME, DATETIME2, DATE, TIME

Recommendation:

  • Unified internal type system
  • Dialect-specific mappers (PostgreSQL types ↔ MySQL types ↔ internal)
  • Vector type exposure (BYTEA for PostgreSQL, VARBINARY for MySQL)

Error Handling

Challenge: Different error codes and formats

PostgreSQL: SQLSTATE (5-char code) + message

SQLSTATE: 42P01
Message: relation "users" does not exist

MySQL: Error number + SQLSTATE + message

Error: 1146
SQLSTATE: 42S02
Message: Table 'demo.users' doesn't exist

SQL Server: Error number + severity + state + message

Error: 208
Severity: 16
State: 1
Message: Invalid object name 'users'.

Recommendation:

class ErrorMapper:
def map_error(self, internal_error, target_protocol):
if target_protocol == "postgresql":
return PostgreSQLError(
sqlstate=self._to_sqlstate(internal_error),
message=internal_error.message
)
elif target_protocol == "mysql":
return MySQLError(
errno=self._to_mysql_errno(internal_error),
sqlstate=self._to_sqlstate(internal_error),
message=internal_error.message
)
# etc.

Async Support

Challenge: Async vs sync drivers

Async Drivers:

  • asyncpg (asyncio native)
  • databricks-sql-connector (Thrift async)

Sync Drivers:

  • psycopg2, PyMySQL, mysql-connector-python, pyodbc, ibm_db

Recommendation:

  • Implement async protocol handlers with asyncio
  • Use thread pool for blocking operations if needed
  • Support concurrent connections per protocol

Testing Matrix

Required Tests per Driver:

  1. Connection Establishment

    • Valid credentials
    • Invalid credentials
    • TLS/SSL negotiation
    • Timeout handling
  2. Authentication

    • All supported methods (password, JWT, OAuth, etc.)
    • Auth failure scenarios
  3. Query Execution

    • Simple query (SELECT 1)
    • Parameterized query
    • Multiple result sets
    • Empty result set
  4. Prepared Statements

    • Prepare + execute
    • Reuse with different parameters
    • Close prepared statement
  5. Transactions

    • BEGIN, COMMIT, ROLLBACK
    • Auto-commit mode
    • Transaction isolation
  6. Data Types

    • Integer, float, decimal
    • Text, varchar
    • Binary, bytea/blob
    • Timestamp, date, time
    • NULL handling
  7. Error Handling

    • Invalid SQL
    • Constraint violation
    • Connection loss
    • Timeout
  8. Cursors (where applicable)

    • Server-side cursor
    • Fetch rows
    • Cursor close
  9. Connection Pooling

    • Acquire/release
    • Max connections
    • Connection reuse
  10. Vector Operations (Pinecone)

    • Upsert, query, delete
    • Metadata filtering
    • Namespaces

CI/CD Integration:

test-matrix:
- driver: psycopg2
version: 2.9.9
protocol: postgresql
- driver: asyncpg
version: 0.29.0
protocol: postgresql
- driver: PyMySQL
version: 1.1.0
protocol: mysql
- driver: mysql-connector-python
version: 8.0.33
protocol: mysql
# ... etc.

Summary & Next Steps

Key Findings

  1. Parameter Style Diversity: 5 different styles (%s, $1, ?, :name, %(name)s)

    • Requires unified parameter adapter
  2. Prepared Statement Complexity: Client-side vs server-side

    • PyMySQL: Client-side escaping
    • mysql-connector-python, asyncpg, oracledb: Server-side caching
    • HeliosDB must support both paradigms
  3. Connection Pooling: Mixed built-in and external

    • Server-side pooling recommended for consistency
  4. Transaction Semantics: Auto-BEGIN vs manual BEGIN

    • Protocol-specific transaction managers needed
  5. Type Systems: Rich but incompatible

    • Unified internal type system with dialect mappers
  6. Async/Sync Split: asyncpg vs psycopg2

    • Async handlers with thread pool fallback

Implementation Priorities

Phase 1 (PostgreSQL + MySQL):

  1. Implement psycopg2 compatibility (%s parameters)
  2. Implement asyncpg compatibility ($1 parameters)
  3. Implement PyMySQL compatibility (client-side escaping)
  4. Implement mysql-connector-python compatibility (server-side prepared)
  5. Unified parameter adapter
  6. Server-side connection pooling

Phase 1.5 (HTTP APIs):

  1. Implement snowflake-connector-python compatibility (JWT auth)
  2. Implement databricks-sql-connector compatibility (Thrift)
  3. Implement pinecone-client compatibility (vector ops → SQL translation)

Phase 2 (SQL Server + DB2):

  1. Implement pyodbc compatibility (? parameters)
  2. Implement ibm_db compatibility (DRDA + EBCDIC handling)

Phase 3 (Oracle):

  1. Implement oracledb compatibility (:name parameters)
  2. Thin mode focus (pure Python reference)

Testing Strategy

Per-Driver Integration Tests:

  • Connection → Auth → Query → Prepared Statement → Transaction → Cursor
  • Run against each Python client library
  • Validate parameter binding, type conversion, error mapping

Compatibility Matrix:

  • Track feature support per driver
  • Mark unsupported features clearly in docs
  • Provide migration guides (expected differences)

Document Status: Complete Coverage: All target Python clients with behavioral analysis Next Steps: Coordinate with architect on unified parameter/type/error handling design