Skip to content

feat(bigquery-analytics): hot-path same-session pause_orphan resolution#7

Open
caohy1988 wants to merge 1 commit into
mainfrom
feat/206-pause-registry-orphan
Open

feat(bigquery-analytics): hot-path same-session pause_orphan resolution#7
caohy1988 wants to merge 1 commit into
mainfrom
feat/206-pause-registry-orphan

Conversation

@caohy1988

Copy link
Copy Markdown
Owner

Staging in the fork before targeting upstream google/adk-python.

Resolves pause_orphan on the long-running TOOL_COMPLETED resume row from the in-memory session history only — zero I/O, no session-service or BigQuery read, no resume-path latency.

TestPauseOrphanSameSession covers found/not-found/missing-id/no-session/id-without-part/never-true/no-session-service-read, plus two emit-path assertions.

Rebased on latest upstream main. Local test run blocked by an offline env (stale venv needs a newer opentelemetry-semconv; uv sync blocked by a platform-incompatible dep) — both files syntax-validated; relying on CI.

@caohy1988 caohy1988 force-pushed the feat/206-pause-registry-orphan branch 2 times, most recently from 538ccce to 1903d36 Compare June 16, 2026 22:35
@caohy1988

Copy link
Copy Markdown
Owner Author

Review finding:

[P2] Treat fallback-emitted pauses as same-session matches.
src/google/adk/plugins/bigquery_agent_analytics_plugin.py:3278-3289 only returns False when long_running_tool_ids contains the id and the original event still has a matching function_call part. But this same plugin already has a fallback path at src/google/adk/plugins/bigquery_agent_analytics_plugin.py:3570-3592: if an event has a long_running_tool_id with no matching function-call part, it still emits a pairable TOOL_PAUSED row with that function_call_id (the after_model_callback rewrite case). On resume, the current resolver leaves that valid same-session pair as unknown / SQL NULL, even though the producer can prove it emitted the originating pause from the same history event.

Please make the resolver count function_call_id in event.long_running_tool_ids as sufficient evidence for False, or otherwise explicitly detect the fallback-emitted-pause case. Then flip test_id_in_long_running_but_no_matching_part_is_unknown to assert False, because that path is not an orphan under the producer's own TOOL_PAUSED emission contract.

The rest of the split looks right: no hot-path sleep, no session-service/BigQuery read, no True emission, SimpleNamespace import fixed, and the real fork test gates are green (Python 3.10-3.14, mypy-diff, pre-commit). Remaining red checks are expected fork noise (Copybara, check-file-contents).

When a long-running tool resumes (a function_response in a user message),
resolve pause_orphan on the TOOL_COMPLETED row from the in-memory session
history only — a zero-I/O check that adds no latency to the resume path and
introduces no session-service or BigQuery read.

_resolve_same_session_pause_orphan returns:
  * False — the originating paused call is present in this session's
    in-memory history (definitely not an orphan);
  * None  — unknown / not yet settled: no id to pair on, no history, or
    the pause was not found in the (possibly trimmed) in-memory history.

The emit path stamps attributes.adk.pause_orphan only when the resolver
returns a real bool; on None the key is omitted so consumers read it as
SQL NULL. The resolver never returns True: declaring a true orphan needs
the settled fallback (an off-hot-path session-service / BigQuery
reconciliation), deliberately out of scope here to keep the resume path
free of added reads. A persistent None is what that future fallback
upgrades to True.

Tests: resolver returns False only on a found pause, None for
not-found / empty / missing-id / no-session / id-without-matching-part,
never True, and does no session_service read; emit path stamps
pause_orphan=False when the pause is in history and omits it otherwise.
@caohy1988 caohy1988 force-pushed the feat/206-pause-registry-orphan branch from 1903d36 to 8dc8621 Compare June 16, 2026 22:56
@caohy1988

Copy link
Copy Markdown
Owner Author

P2 addressed in 8dc8621. The resolver now returns False on id-membership alonefunction_call_id in event.long_running_tool_ids — dropping the surviving-function_call-part requirement. This matches what the producer actually writes: it emits a pairable TOOL_PAUSED for every id in long_running_tool_ids, both the per-part path and the fallback path (long_running_ids - paused_ids_emitted) that fires when the originating part was rewritten away. The old AND-condition missed that fallback case and wrongly reported unknown.

Test renamed test_id_in_long_running_without_part_is_not_orphan and now asserts False. Docstring updated to explain the membership-sufficiency. isort/pyink clean; test gate re-running.

Thanks for the cross-check against google#6140 — confirmed: that upstream PR is closed/stale at 04b374d; the live target is this fork PR #7, now at 8dc8621.

@caohy1988

Copy link
Copy Markdown
Owner Author

Re-review: the P2 is fixed.

The resolver now treats function_call_id in event.long_running_tool_ids as sufficient same-session evidence and returns False immediately. That matches the producer contract because the plugin emits a pairable TOOL_PAUSED for every id in long_running_tool_ids, including the fallback path where the original function_call part was rewritten away. The test was also flipped correctly to test_id_in_long_running_without_part_is_not_orphan.

I re-checked the scope guardrails: no added hot-path sleep, no get_session / session-service read, no BigQuery query/read, and still no pause_orphan=True path. git diff --check is clean; pre-commit and mypy-diff are green.

Only remaining gate is mechanical: the Python test matrix is still pending on the latest head 8dc8621. The red Copybara/check-file-contents jobs are still expected fork-PR noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant