fix(backend/kernel): comparator parity — async retention + intervals_as_string + precision/scale + named params + utils.exc fix by vikrantpuppala · Pull Request #803 · databricks/databricks-sql-python

vikrantpuppala · 2026-05-23T12:16:05Z

Summary

Five comparator-parity fixes bundled because they share the same audit baseline and the same kernel-backend client surface.

1. Async Statement retention

KernelDatabricksClient.execute_command closed the parent Statement in finally regardless of async_op. The kernel's Statement.close() invalidates child handles (see databricks-sql-kernel/src/statement/validity.rs), so the async handle was being killed before the user could poll it — breaking the entire async surface (execute_async → is_query_pending → get_async_execution_result).

Fix: when async_op=True, retain the parent Statement in a new _async_statements dict alongside _async_handles, and close it from close_command, close_session, and get_execution_result after the executed handle is done.

Comparator outcome: STATEMENT_ASYNC suite 3/3 match (was 0/3).

2. `intervals_as_string` wire-through (companion to kernel PR #64)

pyarrow's Python bindings cannot decode Arrow's month_interval type at all (id 21 — raises KeyError from .as_py, to_pylist, cast(string), to_pandas). Every kernel-backend SELECT * over any table with an INTERVAL YEAR TO MONTH column threw ArrowNotImplementedError — 32 / 88 audit diffs.

Fix: pass intervals_as_string=True to the kernel Session(...) constructor unconditionally. The kernel post-processor then stringifies Interval / Duration columns server-side to Utf8 (see kernel PR #64), so pyarrow never sees the unreadable type.

Comparator outcome: bucket A (ArrowNotImplementedError) drops from 32 → 0 diffs.

3. Decimal precision/scale extraction

description_from_arrow_schema hard-coded None for slots 4 and 5 of the PEP 249 description tuple. For DECIMAL columns the Arrow schema carries precision / scale on Decimal128Type, but we were silently dropping them. The kernel backend returned:

('decimal_column', 'decimal', None, None, None, None, None)

while Thrift returned:

('decimal_column', 'decimal', None, None, 10, 2, None)

Any consumer that reads PEP 249 slots 4/5 (SQLAlchemy, pandas-read-sql, etc.) can't distinguish DECIMAL(10,2) from DECIMAL(38,18) on the kernel backend. 88 comparator diffs (44 precision + 44 scale).

Fix: factor out _precision_scale_for_arrow_type(arrow_type) and call it from the description builder. Today it only handles decimals; future extensions (e.g. fractional-second precision from Time64Type) slot in here without touching the description builder.

Comparator outcome: 88 precision+scale diffs → 0.

4. Named-parameter binding (companion to kernel PR #65)

bind_tspark_params previously raised NotSupportedError for any TSparkParameter with name set. The canonical SEA proto marks StatementParameter.name as openapi_required=true — named binding is the spec-required public form, and ordinal is PUBLIC_UNDOCUMENTED. Kernel PR #65 added a Statement.bind_named_param PyO3 API mirroring bind_param.

Fix: route named bindings via the new kernel API. Positional ordinals self-increment so a named entry in the middle of the list doesn't claim a positional slot.

Comparator outcome: PREPARED_STATEMENT_NAMED 1/1 match (was 0/1). Bucket B1 in comparator-diff-tasklist.md drops to 0 diffs.

5. `utils.ParamEscaper` `exc.ProgrammingError` import fix (was PR #802)

ParamEscaper.escape_args and ParamEscaper.escape_item both raised exc.ProgrammingError(...) (lines 551 and 609 of utils.py), but exc was never imported. On any unsupported parameter shape the user saw NameError: name 'exc' is not defined instead of a clean PEP 249 ProgrammingError.

Surfaced via the python-comparator audit harness running the INLINE_PARAMS connection config: both backends raised NameError, which the comparator's class+message match treated as parity — a false-positive that hid both the real driver bug and the underlying caller-side type mismatch.

Fix: import ProgrammingError directly from databricks.sql.exc (matching the pattern used in session.py, client.py, result_set.py, etc.) and replace the two exc.ProgrammingError(...) sites with bare ProgrammingError(...).

Folded in from #802 (closed in favour of this bundle).

Dependencies

Items #2, #3, and #4 require the matching databricks-sql-kernel changes:

kernel PR #64 — intervals_as_string post-processor + empty-result schema fix + composite type parser + TIMESTAMP tz + metadata reshape fixes
kernel PR #65 — StatementParameter.name + bind_named_param API (to be opened)

For local testing the comparator harness uses KERNEL_FREEZE=1 against a kernel checkout of the feature branches.

Headline comparator results (full thrift-vs-kernel run)

	Before	After
match / diff / skipped	60 / 88 / 0	97 / 34 / 1
Bucket A (ArrowNotImplementedError)	32	0
`decimal_column` precision/scale	88	0
Bucket B1 (named params)	1	0
Type-code mismatches (slot 1)	21	14
Suites fully clean	12 / 30	17 / 30

Remaining 34 diffs cluster into documented causes (complex types in fetchall_arrow, timestamp tz on Arrow path, VOID surface, METADATA pattern semantics, empty-result nullability) — tracked in ~/docs/python-kernel/comparator-diff-tasklist.md.

Test plan

Manual repro of async path: cur.execute_async("SELECT 1"); while cur.is_query_pending(): time.sleep(0.2); cur.get_async_execution_result(); cur.fetchall() succeeds on use_kernel=True
Manual repro of interval path: cur.execute("SELECT ym_interval_column FROM ...") returns string-shaped rows matching the Thrift surface
Manual repro of decimal path: cur.description for a DECIMAL(10,2) column now reports precision=10, scale=2 on both backends
Live e2e for named params: test_parameterized_query_named_params / test_parameterized_query_named_param_with_null (added) pass against pecotesting warehouse
Unit tests: bind_tspark_params named/positional/mixed routing (tests/unit/test_kernel_type_mapping.py)
Manual repro of utils fix: ParamEscaper().escape_args(object()) raises ProgrammingError, not NameError
Full thrift-vs-kernel comparator: 97 / 34 / 1 of 132
Existing connector unit suite: 752 passed

This pull request and its description were written by Isaac.

gopalldb · 2026-05-25T06:02:41Z

Minor

Missing unit test for the new decimal precision/scale extraction (tests/unit/test_kernel_type_mapping.py)
_precision_scale_for_arrow_type is the load-bearing piece of fix #3, but no test exercises description_from_arrow_schema with a decimal column. The pre-existing assert d[2:] == (None, None, None, None, None)
only passes because no test schema has decimal fields. Add a parametrised (pa.decimal128(10, 2), 10, 2) + decimal256 case.

_async_statements[guid] = stmt overwrite leaks on duplicate statement_id (client.py:305)
Mirrors the existing _async_handles overwrite pattern, so pre-existing in shape — but the new dict holds a server-side Statement with a stronger leak consequence (lasts until close_session). Theoretical
only; server IDs are unique in practice.

cancel_command does not release the parent Statement (client.py:333-346)
Only _async_handles is touched; _async_statements[guid] lingers until close_command or close_session. The connector's normal cursor lifecycle fires close_command on exit, so unlikely in practice. Either
document the requirement or eagerly close in cancel_command.

intervals_as_string=True is hard-coded with no opt-out (client.py:188)
Fine today (pyarrow can't decode month_interval at all). Forward-looking: a future user wanting native intervals will need a new connector kwarg. No fix required.

Stale-wheel errors for intervals_as_string / bind_named_param surface as generic OperationalError("Unexpected error from databricks_sql_kernel: ...") (client.py:172-192, type_mapping.py:235)
Same shape as PR #795's stale-wheel concern. Out of scope for this bundle; consider a version check in a follow-up.

vikrantpuppala · 2026-05-25T07:07:55Z

Thanks for the review. Summary of what's addressed in the new push (ecd0d37b):

Item 1 — fixed

Missing decimal precision/scale unit test ✅
Added two new tests in tests/unit/test_kernel_type_mapping.py:

test_description_extracts_decimal_precision_scale — parametrized over pa.decimal128(10,2) / decimal128(38,0) / decimal128(38,18) / decimal256(76,38). Asserts slots 4/5 are populated correctly.
test_description_non_decimal_columns_have_none_precision_scale — companion test that catches a regression where the helper accidentally extracts precision from non-decimal types (e.g. Time64 fractional-second precision).

The existing test_description_from_schema_preserves_field_names_and_order stays — its d[2:] == (None,)*5 assertion is still true for the non-decimal schema it uses; the new tests cover the decimal case the existing one didn't reach.

pytest tests/unit/test_kernel_type_mapping.py: 48 / 48 pass (was 43; +5 new).

Items 2, 3, 4, 5 — accepted as documented

Per your own punt signals, leaving these as-is for follow-up:

(2) _async_statements[guid] overwrite — theoretical; server IDs unique in practice; same shape as the existing _async_handles pattern.
(3) cancel_command doesn't release the parent Statement — connector normal cursor lifecycle fires close_command on __exit__. Will fix if it ever bites.
(4) intervals_as_string=True hard-coded — no fix required today (pyarrow can't decode month_interval); future-looking knob for a follow-up PR.
(5) Stale-wheel OperationalError shape — out of scope for this bundle. Will consider a __version__ check + a structured KernelVersionMismatch exception in a follow-up.

Signed off the amended commit.

…as_string + precision/scale + named params + utils.exc fix Five small comparator-parity fixes surfaced by the same audit run. All are independently small but ship together because they share the kernel-backend client surface and the audit baseline. ### 1. Async Statement retention `KernelDatabricksClient.execute_command` closed the parent `Statement` in `finally` regardless of `async_op`. The kernel's `Statement.close()` invalidates child handles (see databricks-sql-kernel `src/statement/validity.rs`), so the async handle died before the user could poll it — breaking `execute_async` → `is_query_pending` → `get_async_execution_result`. Fix: when `async_op=True`, retain the parent Statement in a new `_async_statements` dict alongside `_async_handles`, and close it from `close_command`, `close_session`, and `get_execution_result` after the executed handle is done. Comparator outcome: STATEMENT_ASYNC suite 3/3 match (was 0/3). ### 2. `intervals_as_string` wire-through pyarrow's Python bindings cannot decode Arrow's `month_interval` type (id 21 — `KeyError` from `.as_py`, `to_pylist`, `cast(string)`, `to_pandas`). Every kernel-backend `SELECT *` over a table with an `INTERVAL YEAR TO MONTH` column raised `ArrowNotImplementedError` — 32 / 88 audit diffs. Fix: pass `intervals_as_string=True` to the kernel `Session(...)` constructor unconditionally. The kernel post-processor stringifies `Interval` / `Duration` columns server-side to `Utf8` (kernel PR #64). Comparator outcome: bucket A (ArrowNotImplementedError) 32 → 0. ### 3. Decimal precision/scale extraction `description_from_arrow_schema` hard-coded `None` for PEP 249 description-tuple slots 4 and 5. For DECIMAL columns the Arrow schema carries `precision` / `scale` on `Decimal128Type`, but they were silently dropped — 88 cell diffs (44 precision + 44 scale). Fix: factor out `_precision_scale_for_arrow_type(arrow_type)` and call it from the description builder. Today it only handles decimals; future extensions slot in here. Comparator outcome: 88 precision+scale diffs → 0. ### 4. Named-parameter binding `bind_tspark_params` raised `NotSupportedError` for any `TSparkParameter` with `name` set. The canonical SEA proto marks `StatementParameter.name` as `openapi_required=true` (named is the spec-required public form; `ordinal` is `PUBLIC_UNDOCUMENTED`). Kernel PR #65 added a `Statement.bind_named_param` PyO3 API. Fix: route named bindings via the new API. Positional ordinals self-increment so a named entry in the middle of the list doesn't claim a positional slot. Comparator outcome: PREPARED_STATEMENT_NAMED 1/1 match (was 0/1). Full thrift-vs-kernel run: 97/132 match (was 96/132). ### 5. `utils.ParamEscaper` `exc.ProgrammingError` import fix `ParamEscaper.escape_args` and `escape_item` both raised `exc.ProgrammingError(...)`, but `exc` was never imported. On any unsupported parameter shape the user saw `NameError: name 'exc' is not defined` instead of a clean PEP 249 `ProgrammingError`. Surfaced via the same audit harness running INLINE_PARAMS: both backends raised `NameError`, which the comparator's class+message match treated as parity — a false-positive that hid both the driver bug and the underlying caller-side type mismatch. Fix: import `ProgrammingError` directly from `databricks.sql.exc` (matching the pattern used in `session.py`, `client.py`, `result_set.py`, etc.) and replace the two `exc.ProgrammingError(...)` sites with bare `ProgrammingError(...)`. ## Dependencies Items #2, #3, and #4 require the matching databricks-sql-kernel changes: PR #64 (`intervals_as_string` + empty-result schema fix) and PR #65 (named-param binding). For local testing the comparator harness uses `KERNEL_FREEZE=1` against a kernel checkout of the feature branches. ## Headline comparator results (full thrift-vs-kernel run) | | Before | After | |------------------------------------|--------|-------| | match / diff / skipped | 60 / 88 / 0 | **97 / 34 / 1** | | Bucket A (ArrowNotImplementedError)| 32 | 0 | | `decimal_column` precision/scale | 88 | 0 | | Bucket B1 (named params) | 1 | 0 | | Suites fully clean | 12 / 30 | **17 / 30** | Remaining 34 diffs cluster into documented causes (complex types in `fetchall_arrow`, timestamp tz on Arrow path, VOID surface, METADATA pattern semantics) — tracked in `~/docs/python-kernel/comparator-diff-tasklist.md`. ## Test plan - [x] Manual repro of async path: `cur.execute_async("SELECT 1")` → `is_query_pending` → `get_async_execution_result` → `fetchall` succeeds on `use_kernel=True` - [x] Manual repro of interval path: `cur.execute("SELECT ym_interval_column FROM ...")` returns string-shaped rows matching the Thrift surface - [x] Manual repro of decimal path: `cur.description` for a `DECIMAL(10,2)` column now reports `precision=10, scale=2` - [x] Live e2e: `test_parameterized_query_named_params` / `test_parameterized_query_named_param_with_null` (added) - [x] Unit tests: `bind_tspark_params` named/positional/mixed routing (`tests/unit/test_kernel_type_mapping.py`) - [x] Manual repro of utils fix: `ParamEscaper().escape_args(object())` raises `ProgrammingError`, not `NameError` - [x] Full thrift-vs-kernel comparator: 97 / 34 / 1 of 132 - [x] Existing connector unit suite: 752 passed Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

vikrantpuppala had a problem deploying to azure-prod May 23, 2026 12:16 — with GitHub Actions Failure

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 291e18e to 312164a Compare May 23, 2026 16:16

vikrantpuppala temporarily deployed to azure-prod May 23, 2026 16:16 — with GitHub Actions Inactive

vikrantpuppala changed the title ~~fix(backend/kernel): retain parent Statement for async-submitted commands~~ fix(backend/kernel): comparator parity — async statement retention + intervals_as_string May 23, 2026

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 312164a to 81f065f Compare May 23, 2026 17:18

vikrantpuppala temporarily deployed to azure-prod May 23, 2026 17:18 — with GitHub Actions Inactive

vikrantpuppala changed the title ~~fix(backend/kernel): comparator parity — async statement retention + intervals_as_string~~ fix(backend/kernel): comparator parity — async statement retention + intervals_as_string + precision/scale May 23, 2026

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 81f065f to 932ec74 Compare May 25, 2026 04:42

vikrantpuppala temporarily deployed to azure-prod May 25, 2026 04:42 — with GitHub Actions Inactive

vikrantpuppala changed the title ~~fix(backend/kernel): comparator parity — async statement retention + intervals_as_string + precision/scale~~ fix(backend/kernel): comparator parity — async retention + intervals_as_string + precision/scale + named params + utils.exc fix May 25, 2026

vikrantpuppala mentioned this pull request May 25, 2026

fix(utils): import ProgrammingError instead of bare exc. reference #802

Closed

2 tasks

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 932ec74 to 73760c7 Compare May 25, 2026 04:46

vikrantpuppala temporarily deployed to azure-prod May 25, 2026 04:46 — with GitHub Actions Inactive

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 73760c7 to ecd0d37 Compare May 25, 2026 07:06

vikrantpuppala temporarily deployed to azure-prod May 25, 2026 07:06 — with GitHub Actions Inactive

gopalldb approved these changes May 25, 2026

View reviewed changes

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from ecd0d37 to 7adb049 Compare May 25, 2026 07:31

vikrantpuppala had a problem deploying to azure-prod May 25, 2026 07:31 — with GitHub Actions Failure

vikrantpuppala merged commit 85b99d4 into main May 25, 2026
38 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend/kernel): comparator parity — async retention + intervals_as_string + precision/scale + named params + utils.exc fix#803

fix(backend/kernel): comparator parity — async retention + intervals_as_string + precision/scale + named params + utils.exc fix#803
vikrantpuppala merged 1 commit into
mainfrom
fix/kernel-async-statement-retention

vikrantpuppala commented May 23, 2026 •

edited

Loading

Uh oh!

gopalldb commented May 25, 2026

Uh oh!

vikrantpuppala commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vikrantpuppala commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Async Statement retention

2. intervals_as_string wire-through (companion to kernel PR #64)

3. Decimal precision/scale extraction

4. Named-parameter binding (companion to kernel PR #65)

5. utils.ParamEscaper exc.ProgrammingError import fix (was PR #802)

Dependencies

Headline comparator results (full thrift-vs-kernel run)

Test plan

Uh oh!

gopalldb commented May 25, 2026

Uh oh!

vikrantpuppala commented May 25, 2026

Item 1 — fixed

Items 2, 3, 4, 5 — accepted as documented

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vikrantpuppala commented May 23, 2026 •

edited

Loading

2. `intervals_as_string` wire-through (companion to kernel PR #64)

5. `utils.ParamEscaper` `exc.ProgrammingError` import fix (was PR #802)