hooks: ship each tool_use once per Stop turn (fix inflated policy-match counts)#157
hooks: ship each tool_use once per Stop turn (fix inflated policy-match counts)#157vigneshsubbiah16 wants to merge 1 commit into
Conversation
…nts) Claude Code (and Codex) fire multiple Stop events per user turn — one per agent turn. The Stop handler rebuilds the exchange window from the persisted audit log, resetting on each UserPromptSubmit and appending every event after. Commit 16d1ffe removed the post-send prune of already-shipped events (it had broken the get_recent_user_prompts_for_session lookback by wiping prior prompts). With the prune gone, every Stop re-shipped ALL PostToolUse events accumulated since the last UserPromptSubmit, so a single file edit was counted 1 + 2 + ... + N times across N Stops — the root cause of inflated policy-match counts (one edit seen 14-35x). Fix — restore the prune safely (mark, don't delete): - claude-code: skip PostToolUse audit entries already flagged SHIPPED_FLAG when building the window; after a successful send, stamp the shipped tool entries (and the turn's UserPromptSubmit anchor) and persist. Entries stay in the log so the user-prompt lookback keeps working. A tool-free conversation turn still ships exactly once via the prompt anchor. - codex: ships tool_uses from the transcript per turn; mark the turn's UserPromptSubmit anchor shipped after a successful send and skip resend. - cursor: groups by generation_id; mark the generation's stop entry shipped and skip if a stop fires twice for the same generation. _persist_*_flags re-read the log before stamping to avoid clobbering entries a concurrent hook appended. Tests: new test_stop_dedup.py in each hook dir reproduces the bug (multiple Stops in one turn) and asserts each tool_use ships exactly once, plus that the get_recent_user_prompts_for_session lookback still returns prior prompts (what 16d1ffe protected). Verified red without the fix, green with it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| def _persist_shipped_flags(session_id: Optional[str], shipped_logs: List[Dict]) -> None: | ||
| """Re-read the audit log and stamp SHIPPED_FLAG on the entries that were | ||
| just shipped, then rewrite it. Re-reading (instead of saving our stale | ||
| in-memory copy) avoids clobbering entries a concurrent hook appended after | ||
| we loaded the log.""" | ||
| if not shipped_logs: | ||
| return | ||
| shipped_ids = {_log_identity(log) for log in shipped_logs} |
There was a problem hiding this comment.
The
session_id parameter is accepted but never referenced inside the function body. _log_identity already embeds the session ID in the identity tuple, so the match is implicitly session-scoped. The dead parameter adds confusion about whether session-scoping is intended as an extra guard and makes callers think they need to supply it. Consider dropping it from the signature (and the call-site).
| def _persist_shipped_flags(session_id: Optional[str], shipped_logs: List[Dict]) -> None: | |
| """Re-read the audit log and stamp SHIPPED_FLAG on the entries that were | |
| just shipped, then rewrite it. Re-reading (instead of saving our stale | |
| in-memory copy) avoids clobbering entries a concurrent hook appended after | |
| we loaded the log.""" | |
| if not shipped_logs: | |
| return | |
| shipped_ids = {_log_identity(log) for log in shipped_logs} | |
| def _persist_shipped_flags(shipped_logs: List[Dict]) -> None: | |
| """Re-read the audit log and stamp SHIPPED_FLAG on the entries that were | |
| just shipped, then rewrite it. Re-reading (instead of saving our stale | |
| in-memory copy) avoids clobbering entries a concurrent hook appended after | |
| we loaded the log.""" | |
| if not shipped_logs: | |
| return | |
| shipped_ids = {_log_identity(log) for log in shipped_logs} |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| usage=transcript_usage, | ||
| ) | ||
|
|
||
| if exchange: | ||
| send_to_api(exchange, api_key) | ||
| if exchange and send_to_api(exchange, api_key): | ||
| # Persist the shipped markers so later Stops in this turn skip what was | ||
| # already sent. The entries stay in the log for the user-prompt lookback. | ||
| for log in pending_tool_logs: | ||
| log[SHIPPED_FLAG] = True | ||
| if prompt_log is not None: | ||
| prompt_log[SHIPPED_FLAG] = True | ||
| _persist_shipped_flags(session_id, pending_tool_logs + ([prompt_log] if prompt_log else [])) | ||
|
|
||
|
|
There was a problem hiding this comment.
TOCTOU race between concurrent Stops
The guard read (prompt_log.get(SHIPPED_FLAG)) and the flag write (_persist_shipped_flags) are not atomic. If two Stop events fire within the same small window — both load the log before either has finished writing — both will see an unshipped state, both pass the guard, and both call send_to_api, shipping the same exchange twice. The re-read inside _persist_shipped_flags only protects against a third process appending new lines, not against two racing Stops both clearing the guard simultaneously. For the common sequential-Stop case this is fully correct; the risk is limited to genuinely concurrent invocations, but it's worth documenting (or protecting with a lock/tmp-rename swap) so the behavior is explicit.
| # Idempotency guard: Codex fires a Stop per agent turn, so a single user | ||
| # turn produces several Stop events. Once this turn's exchange has shipped, | ||
| # re-sending would re-count the same tool_uses (root cause of inflated | ||
| # policy-match counts), so skip it. | ||
| if prompt_log is not None and prompt_log.get(SHIPPED_FLAG): | ||
| return |
There was a problem hiding this comment.
Turn-level flag silently drops tool_uses added after the first Stop
Codex marks the UserPromptSubmit anchor shipped as soon as Stop #1 succeeds. If Stop #2 fires and the transcript now contains tool_uses that weren't present when Stop #1 ran (e.g., the agent performed a file edit between the two sub-turns), those new tool_uses are never shipped — the guard returns early without inspecting the transcript again. The claude-code hook avoids this by checking per-PostToolUse entries. If multi-Stop, multi-tool-use codex turns are possible in practice, consider tracking shipped transcript offsets or tool indices rather than a single anchor flag.
Summary
The Unbound hook sender re-ships the same tool_uses to the proxy on every
Stopevent within a single user turn. Because Claude Code fires multipleStopevents perUserPromptSubmit(one per agent turn), a single file edit gets counted 1 + 2 + … + N times across N Stops. This is the root cause of inflated policy-match counts (one edit observed counted 14–35×).This PR makes each tool_use ship exactly once per turn across all three hook flavors (claude-code, codex, cursor), while keeping the audit log intact so the user-prompt lookback keeps working.
Root cause
process_stop_eventrebuilds the exchange window from the persisted audit log (~/.claude/hooks/agent-audit.log): it resets the window on eachUserPromptSubmitand appends every event after.build_llm_exchangethen emits onetool_useperPostToolUseentry in that window.Commit
16d1ffe("hooks: stop wiping the audit log on Stop; rely on rotation", 2026-05-13) removed the post-send prune of already-shipped events. It did so deliberately — the prune had been deleting every entry for the current session after a send, which broke the then-newget_recent_user_prompts_for_sessionlookback (no prior prompts left to find). But with the prune gone, nothing stops each subsequentStopfrom re-shipping the whole accumulated window:→ cumulative re-sends of the same tool_use entries.
Fix — restore the prune safely (mark, don't delete)
Instead of re-deleting shipped events (which would re-break the lookback
16d1ffeprotected), entries are annotated with anunbound_shippedflag and left in the log:PostToolUseentries already flaggedSHIPPED_FLAGwhen building the window; after a successful send, stamp the shipped tool entries and the turn'sUserPromptSubmitanchor, then persist. A tool-free conversation turn still ships exactly once (via the prompt anchor), preserving prior behavior.UserPromptSubmitanchor is marked shipped after a successful send and a later Stop in the same turn skips the resend.generation_id; the generation'sstopentry is marked shipped so astopthat fires twice for one generation doesn't double-ship._persist_*_flagsre-read the log immediately before stamping, so a concurrent hook that appended new lines after we loaded the log isn't clobbered.Before / after
For a turn with 2 edits and 3 Stop events:
[/a.py, /a.py, /a.py, /b.py, /b.py, /b.py](6 — triangular re-send)[/a.py, /b.py](2 — once each)The user-prompt lookback (
get_recent_user_prompts_for_session) returns the same prior prompts in both cases — the entries are retained, only annotated.Test plan
New
test_stop_dedup.pyin each hook dir (function-level, matching the repo's existingunittest+patch.objectconvention):PostToolUsetool_use ships exactly once (idempotent)get_recent_user_prompts_for_sessionstill returns prior user prompts after Stop processing (the16d1ffeprotection)Verified red without the fix (claude-code dedup tests fail, e.g.
['/a.py','/a.py','/a.py','/b.py','/b.py','/b.py'] != ['/a.py','/b.py']) and green with it. The full existing hook suites pass; the one pre-existingtest_identity.test_keys_limited_to_identity_fieldsfailure exists unchanged onmainand is unrelated to this change. CI gate (py_compile+ pyflakes undefined-name) is green.Run:
Follow-up (not in this PR)
The
unbound-hookfrozen binary (binary/unbound-hook.spec) vendors these hook sources (it bundles../claude-code/hooks/unbound.py,../cursor/unbound.py, and../codex/hooks/unbound.pyundervendored/), so devices running the binary won't pick up this fix until the binary is rebuilt and redeployed via MDM. That rebuild/redeploy is intentionally out of scope here and should be tracked/executed separately.🤖 Generated with Claude Code
Greptile Summary
This PR fixes inflated policy-match counts caused by
process_stop_eventre-shipping the sametool_useentries on everyStopevent within a single user turn. The fix introduces aSHIPPED_FLAGannotation (unbound_shipped) that is written to the audit log after a successful send, letting subsequent Stops in the same turn skip already-shipped entries while leaving them in the log so the user-prompt lookback continues to work.PostToolUseentry flag — each entry is individually marked, so tool_uses arriving between Stops are still captured on the next Stop without re-shipping earlier ones.UserPromptSubmitanchor — simpler but means any tool_use that reaches the transcript after Stop [WEB-3217] Adds --clear and --debug #1 is silently skipped by Stop [WEB-3237] Captures model from cursor #2+.stopentry — clean fit since cursor already groups bygeneration_id.test_stop_dedup.py) covering idempotency, incremental arrivals, and lookback preservation.Confidence Score: 4/5
Safe to merge for sequential Stop flows, which is the common case; the codex hook silently drops tool_uses that arrive between Stops.
The claude-code and cursor hooks are solid — entry-level flags with a re-read-before-write persist, backed by comprehensive regression tests. The codex hook marks the whole turn shipped on the first Stop, so any tool_use that reaches the transcript after Stop #1 will never be sent. Whether this happens in practice depends on codex's sub-turn ordering, but no test covers that scenario. The unused session_id parameter in _persist_shipped_flags is a cosmetic issue. The TOCTOU window for concurrent Stops is inherent to the file-based log approach and dramatically narrower than before the fix.
codex/hooks/unbound.py — the turn-level shipped flag could silently miss tool_uses added to the transcript between consecutive Stops in the same user turn
Important Files Changed
Sequence Diagram
sequenceDiagram participant CC as Claude Code participant Hook as process_stop_event participant Log as audit-log participant API as Proxy API CC->>Hook: "Stop #1 (turn T)" Hook->>Log: load_existing_logs() Log-->>Hook: [UserPromptSubmit, PostToolUse(/a.py), PostToolUse(/b.py)] Note over Hook: pending_tool_logs = [/a.py, /b.py], turn_already_shipped = False Hook->>Hook: build_llm_exchange(session_events) Hook->>API: send_to_api(exchange) → True Hook->>Log: _persist_shipped_flags() re-read → stamp SHIPPED_FLAG → write CC->>Hook: "Stop #2 (turn T, same session)" Hook->>Log: load_existing_logs() Log-->>Hook: [UserPromptSubmit✓, PostToolUse(/a.py)✓, PostToolUse(/b.py)✓] Note over Hook: pending_tool_logs = [], turn_already_shipped = True → early return Hook-->>CC: (no send) CC->>Hook: "Stop #1 (turn T+1, new UserPromptSubmit)" Hook->>Log: load_existing_logs() Log-->>Hook: [...✓ entries, UserPromptSubmit (new), PostToolUse(/c.py)] Note over Hook: Window resets, pending_tool_logs = [/c.py], turn_already_shipped = False Hook->>API: send_to_api(exchange) → True Hook->>Log: _persist_shipped_flags()Reviews (1): Last reviewed commit: "hooks: ship each tool_use once per turn ..." | Re-trigger Greptile