Skip to content

feat(bigquery-analytics): hot-path same-session pause_orphan resolution#6140

Closed
caohy1988 wants to merge 1 commit into
google:mainfrom
caohy1988:feat/206-pause-registry-orphan
Closed

feat(bigquery-analytics): hot-path same-session pause_orphan resolution#6140
caohy1988 wants to merge 1 commit into
google:mainfrom
caohy1988:feat/206-pause-registry-orphan

Conversation

@caohy1988

Copy link
Copy Markdown

What

Resolves pause_orphan on the long-running TOOL_COMPLETED row (emitted when a paused tool resumes via a function_response in a user message) from the in-memory session history only.

_resolve_same_session_pause_orphan(invocation_context, function_call_id):

  • False — the originating paused call is present in this session's in-memory history → definitely not an orphan.
  • None — unknown / not yet settled: no id to pair on, no history, or the pause wasn't found in the (possibly trimmed) in-memory history.

The emit path stamps attributes.adk.pause_orphan only when the resolver returns a real bool; on None the key is omitted so consumers read it as SQL NULL ("not yet determined").

Why this shape (hot-path-safe split)

This is the resume path. The resolver is a zero-I/O in-memory scan: it adds no latency and introduces no session-service or BigQuery read into a plugin that is otherwise write-only.

The resolver never returns True. Declaring a true orphan requires a settled reconciliation — an off-hot-path session-service / BigQuery read after a settling delay — which is intentionally out of scope here. Adding a 30s settling sleep + a BQ read to the resume callback would be the wrong product tradeoff (resume latency) and a real architectural change (write-only producer → producer that also reads). A persistent None is exactly what that future fallback upgrades to True. Follow-up tracked separately (see #206 "settled pause-orphan reconciliation").

Tests

TestPauseOrphanSameSession: resolver returns False only on a found pause; None for not-found / empty / missing-id / no-session / id-without-matching-part; never True; and does no session_service read (an exploding session_service proves it). Two emit-path tests assert pause_orphan=False when the pause is in history and the key is omitted otherwise.

Note on local verification: I was unable to run the suite in my local environment — it's offline and the checked-out main requires a newer opentelemetry-semconv (GEN_AI_TOOL_DEFINITIONS) than is installed, and uv sync is blocked by an unrelated platform-incompatible dep (google-antigravity, no macOS-x86_64 wheel). Both changed files are syntax-validated; CI is the verification gate for the test run. Happy to iterate on any setup detail CI surfaces.

When a long-running tool resumes (a function_response in a user message),
resolve pause_orphan on the TOOL_COMPLETED row from the in-memory session
history only — a zero-I/O check that adds no latency to the resume path and
introduces no session-service or BigQuery read.

_resolve_same_session_pause_orphan returns:
  * False — the originating paused call is present in this session's
    in-memory history (definitely not an orphan);
  * None  — unknown / not yet settled: no id to pair on, no history, or
    the pause was not found in the (possibly trimmed) in-memory history.

The emit path stamps attributes.adk.pause_orphan only when the resolver
returns a real bool; on None the key is omitted so consumers read it as
SQL NULL. The resolver never returns True: declaring a true orphan needs
the settled fallback (an off-hot-path session-service / BigQuery
reconciliation), deliberately out of scope here to keep the resume path
free of added reads. A persistent None is what that future fallback
upgrades to True.

Tests: resolver returns False only on a found pause, None for
not-found / empty / missing-id / no-session / id-without-matching-part,
never True, and does no session_service read; emit path stamps
pause_orphan=False when the pause is in history and omits it otherwise.

@caohy1988 caohy1988 left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings: none.

I reviewed this against the locked #206 split decision. The PR keeps the resume path hot-path-safe: no added sleep, no get_session / session-service read, no BigQuery read/query, and _resolve_same_session_pause_orphan() never returns True. The only positive resolution is False when the same session history contains a matching TOOL_PAUSED origin (function_call.id present in event.long_running_tool_ids), and unknown cases omit attributes.adk.pause_orphan so consumers see SQL NULL. That matches the agreed first PR scope; the settled orphan reconciliation remains correctly deferred.

I also checked that the pause producer and resolver use the same pair key, the emit path preserves existing pause_kind / function_call_id, and the new tests cover found, not found, no session/history/id, id-without-matching-part, no session-service read, emit false, and emit omitted. git diff --check is clean.

One operational note: the PR is currently BEHIND origin/main. A merge-tree check against current main shows no conflict in the touched files, but it should be synced/rebased before final merge so required checks rerun on the current base. Local focused pytest is blocked on this machine by an unrelated OpenTelemetry semconv import mismatch in the adk-python environment, so I relied on the diff/contract checks plus GitHub checks for execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants