Skip to content

feat: SPOG (Single Point of Gateway) host support#1479

Open
sd-db wants to merge 15 commits into
mainfrom
sd-db/spog-impl
Open

feat: SPOG (Single Point of Gateway) host support#1479
sd-db wants to merge 15 commits into
mainfrom
sd-db/spog-impl

Conversation

@sd-db
Copy link
Copy Markdown
Collaborator

@sd-db sd-db commented May 25, 2026

Summary

Adds SPOG (Single Point of Gateway) support — account-level vanity hosts like peco.azuredatabricks.net where workspaces are disambiguated by ?o=<workspace-id> on http_path. Convention matches databricks-sql-python (#767), databricks-sql-go, databricks-jdbc, and the ADBC Rust driver.

Builds on the dep-ceiling bumps already in main (#1474). Feature is opt-in via those bumps: activates only when databricks-sql-connector ≥ 4.2.6 and databricks-sdk ≥ 0.104.0 are installed. Pre-SPOG dep versions continue to work unchanged on legacy hosts — non-SPOG users see no behavior change.

What changes

  • New package dbt/adapters/databricks/spog/:
    • extract — parse ?o= (or fall back to /o/<id>/ in cluster paths) from http_path.
    • capabilities — runtime detect: connector_supports_spog (PEP-440 version compare), sdk_supports_workspace_id (feature-detect via inspect.signature(Config)).
    • probe — one-shot GET /.well-known/databricks-config per host with 3-attempt backoff; probe failure is non-fatal.
    • decision — applies the §8 decision matrix at connection.open(); raises DbtConfigError with a pointed upgrade/fix message on every misconfig row.
  • credentials.py:
    • Cluster-ID regex tightened: (.*)([^?&]+) so the capture stops at any query string (independently useful even on legacy hosts).
    • DatabricksCredentialManager gains a workspace_id field populated by extract_workspace_id(credentials.http_path).
    • All five authenticate_with_* methods plumb workspace_id into Config(...) via a single _config_kwargs helper, gated on sdk_supports_workspace_id() so old SDKs are unaffected.
  • connections.py: DatabricksConnectionManager.open() collects every http_path in play (default + per-compute) and invokes check_spog_preconditions(...) before constructing conn_args. No-op on legacy hosts; pointed DbtConfigError on misconfig.
  • impl.py: DatabricksAdapter.debug_query override emits a SPOG status block (host_type, workspace_id, dep-version suitability) before the standard select 1 as id — makes "is SPOG working here?" a one-command answer via dbt debug.

Misconfiguration handling

Each row in §8 of the design fails fast at connection.open() with a DbtConfigError naming the file/field to fix:

Host type ?o= present? Connector / SDK suitable? Behavior
SPOG yes yes proceed
SPOG no error: http_path is missing ?o=<workspace-id>
SPOG yes no error: upgrade databricks-sql-connector / databricks-sdk
non-SPOG yes error: remove ?o= from http_path (or fix host)
non-SPOG no proceed (probe failure is non-fatal)

Design doc

docs/superpowers/specs/2026-05-19-dbt-databricks-spog-design.md (committed in this PR) holds the full spec — background, the §8 decision matrix, all the upstream PRs referenced, and rationale for opt-in via ceiling bumps.

Test plan

  • Unit tests pass locally (hatch run unit tests/unit -q) — 1174 passed, 6 skipped
  • pre-commit run --all-files passes
  • Functional tests against legacy host (existing CI: databricks_uc_sql_endpoint, databricks_uc_cluster, databricks_cluster)
  • Functional tests against SPOG host (new .github/workflows/spog-integration.yml, manual / scheduled — points at peco.azuredatabricks.net)
  • dbt debug exercises the new SPOG status block on both SPOG and legacy hosts

sd-db added 2 commits May 25, 2026 11:08
Implements support for Databricks SPOG hosts — account-level vanity URLs
(e.g. peco.azuredatabricks.net) where workspaces are disambiguated by a
`?o=<workspace-id>` query parameter on http_path. Approach matches the
convention adopted by databricks-sql-python, databricks-sql-go,
databricks-jdbc, and the ADBC Rust driver: parse ?o= from http_path and
use it to set the X-Databricks-Org-Id header on non-OAuth endpoints.

Opt-in via the dependency ceiling bumps already landed: requires
`databricks-sql-connector >= 4.2.6` and `databricks-sdk >= 0.104.0` for
the SPOG code path to activate. Pre-SPOG dep versions continue to work
unchanged on legacy hosts.

- `extract.py` — pure parser; pulls ?o=<workspace-id> from http_path.
- `capabilities.py` — runtime detect SPOG support: `connector_supports_spog`
  (version-detect with packaging.version.Version), `sdk_supports_workspace_id`
  (feature-detect via inspect.signature(Config) so forks/wrappers report
  correctly).
- `probe.py` — one-shot per-host probe of /.well-known/databricks-config.
  3-attempt exponential backoff; on exhaust returns HostMetadata(host_type=None)
  + WARN. Probe failure is never fatal.
- `decision.py` — applies the spec §8 decision matrix at connection.open():
  raises DbtConfigError on every misconfiguration row with a pointed
  upgrade/fix message; returns the extracted workspace_id on the happy path.

- `credentials.py`:
  - Cluster-ID regex tightened: `(.*)` -> `([^?&]+)` so the capture stops
    at any query string (independently useful even on legacy hosts).
  - DatabricksCredentialManager gains a `workspace_id` field populated by
    `extract_workspace_id(credentials.http_path)` in create_from.
  - All five `authenticate_with_*` methods now plumb workspace_id into
    `Config(...)` via a single `_config_kwargs` helper — gated on
    `sdk_supports_workspace_id()` so old SDKs are unaffected.
- `connections.py`: `DatabricksConnectionManager.open()` collects every
  http_path in play (default + per-compute) and invokes
  `check_spog_preconditions(host=..., http_paths=...)` before constructing
  conn_args. On legacy hosts the call is a no-op; on misconfiguration it
  raises a pointed DbtConfigError.
- `impl.py`: `DatabricksAdapter.debug_query` override emits a SPOG status
  block (host_type, workspace_id, dep-version suitability) before the
  standard `select 1 as id`. Makes 'is SPOG working here?' a one-command
  answer for support escalations.

- 35 unit tests under `tests/unit/spog/` covering every §8 matrix row,
  retry/backoff math, capability detection branches, both-deps-old
  ordering, HTTP-error retry fallback, and probe caching.
- 17 cross-module unit tests (workspace_id plumbing, connection.open
  wiring, dbt debug block, cluster-id regex).
- 3 functional tests under `tests/functional/adapter/spog/`:
  - test_spog_debug — assert dbt debug emits the SPOG block.
  - test_spog_missing_o_raises — strip ?o=, expect the §8 row-4 error.
  - test_spog_probe_failure_fallback — simulate probe failure; expect
    WARN + run still succeeds.
  All three skip when DBT_DATABRICKS_SPOG_* env vars are absent.

`.github/workflows/spog-integration.yml` runs the SPOG functional tests
against `peco.azuredatabricks.net?o=6436897454825492` using the existing
Azure secrets (same workspace; only host + ?o= suffix differ). Forces
SPOG-capable connector and SDK pins. Triggered weekly + workflow_dispatch.

Design spec at `docs/superpowers/specs/2026-05-19-dbt-databricks-spog-design.md`;
implementation plan at `docs/superpowers/plans/2026-05-19-dbt-databricks-spog.md`;
follow-up items tracked in `.claude/ideas/spog-future-tasks.md` (gitignored,
local-only). CHANGELOG entry added under `dbt-databricks next`.
- test_python_helpers: stub Mock() credentials.http_path with a real
  string so extract_workspace_id() (now called in
  DatabricksCredentialManager.create_from) doesn't trip on
  "argument of type 'Mock' is not iterable".
- test_auth (TestEnsureConfigTriggersTheRightAuth): autouse-patch
  sdk_supports_workspace_id() to False so the auth-routing
  assertions stay focused on auth_type. SPOG workspace_id plumbing
  has its own coverage in tests/unit/spog/.
@sd-db sd-db requested a review from jprakash-db as a code owner May 25, 2026 05:44
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 25, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  dbt/adapters/databricks
  connections.py
  credentials.py
  impl.py 1115-1116, 1120-1125, 1140-1141
  dbt/adapters/databricks/spog
  capabilities.py
  decision.py
  extract.py
  probe.py
Project Total  

This report was generated by python-coverage-comment-action

sd-db added 7 commits May 26, 2026 12:06
- ?o= is the only canonical SPOG opt-in marker. Drop /o/<id>/
  cluster-path extraction in spog/extract.py to match the connector's
  _extract_spog_headers contract; cluster paths must add ?o= explicitly.
- Short-circuit check_spog_preconditions when either dep is below the
  SPOG floor. Pre-SPOG dep installs are fully dormant — no probe, no
  matrix, no behavior change vs the pre-SPOG era.
- Downgrade "SPOG host without ?o=" and "non-SPOG host with stray ?o="
  from DbtConfigError to logger.warning. Only multi-compute ?o= conflict
  stays a hard raise.
- impl.py + decision.py: import the probe module qualified-style so tests
  only need one mock.patch site instead of patching each binding alias.
- Tests: localize the probe stub to TestDatabricksAdapter (the only class
  that exercises connection.open() end-to-end). Add a per-dir conftest
  under tests/unit/spog/ that raises AssertionError on any unmocked
  requests.get inside probe.py — keeps unit tests offline even where the
  parent probe stub doesn't apply.
- test_auth: drop the _no_workspace_id_plumbing fixture. With the ?o=-only
  extractor, _COMMON_KWARGS's cluster path no longer produces a
  workspace_id, so the SPOG branch in _config_kwargs is already inert.
- CHANGELOG: terse the SPOG entry.
Per review:
- spog-integration.yml: drop the SPOG-specific DBT_DATABRICKS_SPOG_*
  workflow env vars and the "force SPOG-capable pin" step. The workflow
  now plugs new SPOG-specific GitHub secrets (DBT_DATABRICKS_SPOG_HOST_NAME
  + DBT_DATABRICKS_SPOG_HTTP_PATH) into the standard env var names, so
  the rest of the test machinery reuses the same `databricks_uc_sql_endpoint`
  profile and no SPOG-aware code lives in profile/conftest land.
- tests/profiles.py: drop `databricks_uc_sql_endpoint_spog_target` and its
  branch — the standard target already reads what the SPOG workflow now
  sets.
- tests/conftest.py: drop the `_spog` suffix carve-out in
  `skip_by_profile_type`; no SPOG profile names remain.
- test_spog_debug: skip when http_path lacks `?o=`; extract the expected
  workspace_id from the http_path env directly instead of reading a
  SPOG-specific env var.
- CHANGELOG: drop the redundant "new spog/ package" Under-the-Hood line;
  the Feature bullet already covers the user-facing change.
The block was paraphrasing what `secrets.X → env.Y` already says. Code
stands on its own; no hidden constraint to call out.
Reuse the existing DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH secret and append
?o=<workspace-id> inline rather than carrying a full duplicated SPOG http_path.
The two new SPOG-specific secrets are host (DBT_DATABRICKS_SPOG_HOST_NAME)
and workspace id (DBT_DATABRICKS_SPOG_WORKSPACE_ID); both plug into the
standard env var names the rest of the test machinery already reads.
Public ubuntu-latest can't reach internal PyPI / Databricks targets.
Match the integration.yml + min-deps-test-fast.yml shape: the protected
runner group, the setup-jfrog-pypi composite action for package installs,
plus pinned setup-python / setup-uv / hatch action SHAs and UV_FROZEN.
- Temporary push trigger scoped to sd-db/spog-impl so we can validate the
  workflow end-to-end before merge (workflow_dispatch needs the file on
  the default branch first; this trigger goes away once the PR lands).
- Drop docs/superpowers/{plans,specs}/2026-05-19-dbt-databricks-spog-*.md
  — internal planning artifacts that shouldn't live in the repo.
- Use OIDC service-principal auth (TEST_PECO_SP_*) instead of PAT, matching
  the integration workflow against the same workspace.
- Hardcode DBT_DATABRICKS_UC_INITIAL_CATALOG=peco (no secret of that name
  exists; the prior placeholder produced an empty value → "Invalid catalog
  name").
- Add `environment: azure-prod` to match the integration workflow scope.
The previous workflow plumbed `secrets.DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH`,
which is a stale 2022 secret pointing at a warehouse no longer reachable
("ENDPOINT_NOT_FOUND: SQL warehouse ... does not exist at all in the database").
Switch to `TEST_PECO_WAREHOUSE_HTTP_PATH` — the same warehouse the live
integration workflow uses — and set it on DBT_DATABRICKS_HTTP_PATH (the
fallback the test profile reads) to match integration.yml's wiring.
- test_spog_debug: accept either DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH or
  DBT_DATABRICKS_HTTP_PATH so the skipif and the assertion line up with
  whichever env var the workflow sets (live workflow uses HTTP_PATH,
  matching integration.yml).
- test_spog_probe_failure_fallback: patch probe_host directly instead of
  requests.get. `mock.patch("...spog.probe.requests.get")` walks to the
  shared requests module and patches it globally — that took out the
  SDK's auth-time HTTP calls and caused dbt debug to fail with
  "invalid_client". Patching probe_host is the correct surgical scope.
Match the integration workflow's TEST_PECO_* secret-name convention
(TEST_PECO_SP_ID, TEST_PECO_WAREHOUSE_HTTP_PATH, etc.) instead of the
adapter-namespaced DBT_DATABRICKS_* prefix, which was misleading for
GitHub secrets (that prefix belongs to env vars the adapter reads).
Drop the SPOG-only subset and instead run all of tests/functional with
--profile databricks_uc_sql_endpoint, with env vars wired to the SPOG
vanity host + ?o= so every code path is exercised through SPOG routing.

- Schedule moved from Saturday 22:00 UTC to Sunday 21:30 UTC
  (Monday 03:00 IST) so results are visible at the start of the week.
- pytest gets -n 10 --dist=loadfile for parallelism and --reruns 1
  --reruns-delay 60 to absorb transient connectivity flakes.
- timeout-minutes bumped 25 → 90 to fit the full suite.
- Added DBT_TEST_USER and DBT_DATABRICKS_LOCATION_ROOT to match the
  env shape the existing integration job uses for the same profile.
Replace the SPOG-only mini-workflow with a near-identical copy of
integration.yml's prepare-shards + 3 sharded functional jobs (uc-cluster,
uc-sql-endpoint, cluster). Same sharding, parallelism, retry, log-upload
shape as the live integration matrix.

The only deltas vs integration.yml:
- triggers: workflow_dispatch + Sunday 21:30 UTC schedule (+ temporary
  push on sd-db/spog-impl). No PR-event / prepare / gate / report-status.
- env: host points at TEST_PECO_SPOG_HOST, the warehouse path carries
  ?o=<wsid>, DBT_DATABRICKS_SPOG_WORKSPACE_ID is exported so the cluster
  path builder uses the SPOG workspace id and appends ?o= to cluster
  paths (so cluster profile control-plane calls also route via SPOG).

build_cluster_http_path.py learns one new env var:
DBT_DATABRICKS_SPOG_WORKSPACE_ID. When set, it bypasses the legacy
hostname-regex derivation (which can't parse `peco.azuredatabricks.net`)
and appends ?o=<wsid> to the cluster/uc-cluster paths. Legacy mode is
untouched (env var unset → hostname-regex path → no suffix).
- test_spog_probe_failure_fallback: drop the hardcoded
  databricks_uc_sql_endpoint_target() fixture override. When the test
  ran in cluster/uc-cluster shards, that override pointed at a profile
  the job didn't have env vars for, causing the connection setup to
  hang in 5-minute retries before erroring. Inheriting the shard's
  active profile fixes it; the probe-failure scenario is
  profile-agnostic anyway.
- test_spog_debug: extend _resolved_http_path() to also check the
  cluster + uc-cluster path env vars (DBT_DATABRICKS_UC_CLUSTER_HTTP_PATH
  and DBT_DATABRICKS_CLUSTER_HTTP_PATH). Without this the skipif
  returned None on those shards and the test was silently skipped even
  though the cluster path now carries ?o=<wsid> under SPOG.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant