feat(bigquery-analytics): hot-path same-session pause_orphan resolution#6140
feat(bigquery-analytics): hot-path same-session pause_orphan resolution#6140caohy1988 wants to merge 1 commit into
Conversation
When a long-running tool resumes (a function_response in a user message),
resolve pause_orphan on the TOOL_COMPLETED row from the in-memory session
history only — a zero-I/O check that adds no latency to the resume path and
introduces no session-service or BigQuery read.
_resolve_same_session_pause_orphan returns:
* False — the originating paused call is present in this session's
in-memory history (definitely not an orphan);
* None — unknown / not yet settled: no id to pair on, no history, or
the pause was not found in the (possibly trimmed) in-memory history.
The emit path stamps attributes.adk.pause_orphan only when the resolver
returns a real bool; on None the key is omitted so consumers read it as
SQL NULL. The resolver never returns True: declaring a true orphan needs
the settled fallback (an off-hot-path session-service / BigQuery
reconciliation), deliberately out of scope here to keep the resume path
free of added reads. A persistent None is what that future fallback
upgrades to True.
Tests: resolver returns False only on a found pause, None for
not-found / empty / missing-id / no-session / id-without-matching-part,
never True, and does no session_service read; emit path stamps
pause_orphan=False when the pause is in history and omits it otherwise.
caohy1988
left a comment
There was a problem hiding this comment.
Findings: none.
I reviewed this against the locked #206 split decision. The PR keeps the resume path hot-path-safe: no added sleep, no get_session / session-service read, no BigQuery read/query, and _resolve_same_session_pause_orphan() never returns True. The only positive resolution is False when the same session history contains a matching TOOL_PAUSED origin (function_call.id present in event.long_running_tool_ids), and unknown cases omit attributes.adk.pause_orphan so consumers see SQL NULL. That matches the agreed first PR scope; the settled orphan reconciliation remains correctly deferred.
I also checked that the pause producer and resolver use the same pair key, the emit path preserves existing pause_kind / function_call_id, and the new tests cover found, not found, no session/history/id, id-without-matching-part, no session-service read, emit false, and emit omitted. git diff --check is clean.
One operational note: the PR is currently BEHIND origin/main. A merge-tree check against current main shows no conflict in the touched files, but it should be synced/rebased before final merge so required checks rerun on the current base. Local focused pytest is blocked on this machine by an unrelated OpenTelemetry semconv import mismatch in the adk-python environment, so I relied on the diff/contract checks plus GitHub checks for execution.
What
Resolves
pause_orphanon the long-runningTOOL_COMPLETEDrow (emitted when a paused tool resumes via afunction_responsein a user message) from the in-memory session history only._resolve_same_session_pause_orphan(invocation_context, function_call_id):False— the originating paused call is present in this session's in-memory history → definitely not an orphan.None— unknown / not yet settled: no id to pair on, no history, or the pause wasn't found in the (possibly trimmed) in-memory history.The emit path stamps
attributes.adk.pause_orphanonly when the resolver returns a real bool; onNonethe key is omitted so consumers read it as SQL NULL ("not yet determined").Why this shape (hot-path-safe split)
This is the resume path. The resolver is a zero-I/O in-memory scan: it adds no latency and introduces no session-service or BigQuery read into a plugin that is otherwise write-only.
The resolver never returns
True. Declaring a true orphan requires a settled reconciliation — an off-hot-path session-service / BigQuery read after a settling delay — which is intentionally out of scope here. Adding a 30s settling sleep + a BQ read to the resume callback would be the wrong product tradeoff (resume latency) and a real architectural change (write-only producer → producer that also reads). A persistentNoneis exactly what that future fallback upgrades toTrue. Follow-up tracked separately (see #206 "settled pause-orphan reconciliation").Tests
TestPauseOrphanSameSession: resolver returnsFalseonly on a found pause;Nonefor not-found / empty / missing-id / no-session / id-without-matching-part; neverTrue; and does nosession_serviceread (an exploding session_service proves it). Two emit-path tests assertpause_orphan=Falsewhen the pause is in history and the key is omitted otherwise.