hooks: ship each tool_use once per Stop turn (fix inflated policy-match counts) by vigneshsubbiah16 · Pull Request #157 · websentry-ai/setup

vigneshsubbiah16 · 2026-06-15T20:52:35Z

Summary

The Unbound hook sender re-ships the same tool_uses to the proxy on every Stop event within a single user turn. Because Claude Code fires multiple Stop events per UserPromptSubmit (one per agent turn), a single file edit gets counted 1 + 2 + … + N times across N Stops. This is the root cause of inflated policy-match counts (one edit observed counted 14–35×).

This PR makes each tool_use ship exactly once per turn across all three hook flavors (claude-code, codex, cursor), while keeping the audit log intact so the user-prompt lookback keeps working.

Root cause

process_stop_event rebuilds the exchange window from the persisted audit log (~/.claude/hooks/agent-audit.log): it resets the window on each UserPromptSubmit and appends every event after. build_llm_exchange then emits one tool_use per PostToolUse entry in that window.

Commit 16d1ffe ("hooks: stop wiping the audit log on Stop; rely on rotation", 2026-05-13) removed the post-send prune of already-shipped events. It did so deliberately — the prune had been deleting every entry for the current session after a send, which broke the then-new get_recent_user_prompts_for_session lookback (no prior prompts left to find). But with the prune gone, nothing stops each subsequent Stop from re-shipping the whole accumulated window:

Stop [WEB-3217] Adds --clear and --debug #1 ships PostToolUse {1}
Stop [WEB-3237] Captures model from cursor #2 ships {1, 2}
Stop [WEB-3074] Adds script for cursor mdm setup #3 ships {1, 2, 3} …

→ cumulative re-sends of the same tool_use entries.

Fix — restore the prune safely (mark, don't delete)

Instead of re-deleting shipped events (which would re-break the lookback 16d1ffe protected), entries are annotated with an unbound_shipped flag and left in the log:

claude-code — skip PostToolUse entries already flagged SHIPPED_FLAG when building the window; after a successful send, stamp the shipped tool entries and the turn's UserPromptSubmit anchor, then persist. A tool-free conversation turn still ships exactly once (via the prompt anchor), preserving prior behavior.
codex — sources tool_uses from the transcript per turn; the turn's UserPromptSubmit anchor is marked shipped after a successful send and a later Stop in the same turn skips the resend.
cursor — groups by generation_id; the generation's stop entry is marked shipped so a stop that fires twice for one generation doesn't double-ship.

_persist_*_flags re-read the log immediately before stamping, so a concurrent hook that appended new lines after we loaded the log isn't clobbered.

Before / after

For a turn with 2 edits and 3 Stop events:

	tool_uses shipped
Before	`[/a.py, /a.py, /a.py, /b.py, /b.py, /b.py]` (6 — triangular re-send)
After	`[/a.py, /b.py]` (2 — once each)

The user-prompt lookback (get_recent_user_prompts_for_session) returns the same prior prompts in both cases — the entries are retained, only annotated.

Test plan

New test_stop_dedup.py in each hook dir (function-level, matching the repo's existing unittest + patch.object convention):

multiple Stops in one turn → each PostToolUse tool_use ships exactly once (idempotent)
a new tool arriving after an earlier Stop ships once, without re-shipping prior tools
a tool-free conversation turn ships exactly once across multiple Stops
get_recent_user_prompts_for_session still returns prior user prompts after Stop processing (the 16d1ffe protection)

Verified red without the fix (claude-code dedup tests fail, e.g. ['/a.py','/a.py','/a.py','/b.py','/b.py','/b.py'] != ['/a.py','/b.py']) and green with it. The full existing hook suites pass; the one pre-existing test_identity.test_keys_limited_to_identity_fields failure exists unchanged on main and is unrelated to this change. CI gate (py_compile + pyflakes undefined-name) is green.

Run:

cd claude-code/hooks && python3 -m unittest test_stop_dedup
cd codex/hooks     && python3 -m unittest test_stop_dedup
cd cursor          && python3 -m unittest test_stop_dedup

Follow-up (not in this PR)

The unbound-hook frozen binary (binary/unbound-hook.spec) vendors these hook sources (it bundles ../claude-code/hooks/unbound.py, ../cursor/unbound.py, and ../codex/hooks/unbound.py under vendored/), so devices running the binary won't pick up this fix until the binary is rebuilt and redeployed via MDM. That rebuild/redeploy is intentionally out of scope here and should be tracked/executed separately.

🤖 Generated with Claude Code

Greptile Summary

This PR fixes inflated policy-match counts caused by process_stop_event re-shipping the same tool_use entries on every Stop event within a single user turn. The fix introduces a SHIPPED_FLAG annotation (unbound_shipped) that is written to the audit log after a successful send, letting subsequent Stops in the same turn skip already-shipped entries while leaving them in the log so the user-prompt lookback continues to work.

claude-code: per-PostToolUse entry flag — each entry is individually marked, so tool_uses arriving between Stops are still captured on the next Stop without re-shipping earlier ones.
codex: turn-level flag on the UserPromptSubmit anchor — simpler but means any tool_use that reaches the transcript after Stop [WEB-3217] Adds --clear and --debug #1 is silently skipped by Stop [WEB-3237] Captures model from cursor #2+.
cursor: generation-level flag on the stop entry — clean fit since cursor already groups by generation_id.
All three flavors ship new regression tests (test_stop_dedup.py) covering idempotency, incremental arrivals, and lookback preservation.

Confidence Score: 4/5

Safe to merge for sequential Stop flows, which is the common case; the codex hook silently drops tool_uses that arrive between Stops.

The claude-code and cursor hooks are solid — entry-level flags with a re-read-before-write persist, backed by comprehensive regression tests. The codex hook marks the whole turn shipped on the first Stop, so any tool_use that reaches the transcript after Stop #1 will never be sent. Whether this happens in practice depends on codex's sub-turn ordering, but no test covers that scenario. The unused session_id parameter in _persist_shipped_flags is a cosmetic issue. The TOCTOU window for concurrent Stops is inherent to the file-based log approach and dramatically narrower than before the fix.

codex/hooks/unbound.py — the turn-level shipped flag could silently miss tool_uses added to the transcript between consecutive Stops in the same user turn

Important Files Changed

Filename	Overview
claude-code/hooks/unbound.py	Core dedup fix: adds SHIPPED_FLAG, _persist_shipped_flags (re-read-before-write), and a per-PostToolUse shipped check; unused session_id parameter in _persist_shipped_flags is dead code
codex/hooks/unbound.py	Turn-level dedup: marks the UserPromptSubmit anchor shipped after the first successful send; any tool_uses added between Stop #1 and a later Stop for the same turn are silently skipped
cursor/unbound.py	Generation-level dedup: stamps SHIPPED_FLAG on the generation's stop entry after a successful send; clean and straightforward
claude-code/hooks/test_stop_dedup.py	New regression suite covering idempotent sends, incremental tool arrivals, tool-free turns, and lookback preservation; well structured with addCleanup teardown
codex/hooks/test_stop_dedup.py	Covers turn-level idempotency and lookback preservation for the codex hook; stubs parse_codex_transcript_for_tools and parse_codex_transcript_for_usage correctly
cursor/test_stop_dedup.py	Covers generation-level idempotency and lookback for the cursor hook; correct use of api_key keyword-only sentinel matching the cursor send_to_api signature

Sequence Diagram

sequenceDiagram
    participant CC as Claude Code
    participant Hook as process_stop_event
    participant Log as audit-log
    participant API as Proxy API

    CC->>Hook: "Stop #1 (turn T)"
    Hook->>Log: load_existing_logs()
    Log-->>Hook: [UserPromptSubmit, PostToolUse(/a.py), PostToolUse(/b.py)]
    Note over Hook: pending_tool_logs = [/a.py, /b.py], turn_already_shipped = False
    Hook->>Hook: build_llm_exchange(session_events)
    Hook->>API: send_to_api(exchange) → True
    Hook->>Log: _persist_shipped_flags() re-read → stamp SHIPPED_FLAG → write

    CC->>Hook: "Stop #2 (turn T, same session)"
    Hook->>Log: load_existing_logs()
    Log-->>Hook: [UserPromptSubmit✓, PostToolUse(/a.py)✓, PostToolUse(/b.py)✓]
    Note over Hook: pending_tool_logs = [], turn_already_shipped = True → early return
    Hook-->>CC: (no send)

    CC->>Hook: "Stop #1 (turn T+1, new UserPromptSubmit)"
    Hook->>Log: load_existing_logs()
    Log-->>Hook: [...✓ entries, UserPromptSubmit (new), PostToolUse(/c.py)]
    Note over Hook: Window resets, pending_tool_logs = [/c.py], turn_already_shipped = False
    Hook->>API: send_to_api(exchange) → True
    Hook->>Log: _persist_shipped_flags()

_{Reviews (1): Last reviewed commit: "hooks: ship each tool_use once per turn ..." | Re-trigger Greptile}

Greptile also left 3 inline comments on this PR.

…nts) Claude Code (and Codex) fire multiple Stop events per user turn — one per agent turn. The Stop handler rebuilds the exchange window from the persisted audit log, resetting on each UserPromptSubmit and appending every event after. Commit 16d1ffe removed the post-send prune of already-shipped events (it had broken the get_recent_user_prompts_for_session lookback by wiping prior prompts). With the prune gone, every Stop re-shipped ALL PostToolUse events accumulated since the last UserPromptSubmit, so a single file edit was counted 1 + 2 + ... + N times across N Stops — the root cause of inflated policy-match counts (one edit seen 14-35x). Fix — restore the prune safely (mark, don't delete): - claude-code: skip PostToolUse audit entries already flagged SHIPPED_FLAG when building the window; after a successful send, stamp the shipped tool entries (and the turn's UserPromptSubmit anchor) and persist. Entries stay in the log so the user-prompt lookback keeps working. A tool-free conversation turn still ships exactly once via the prompt anchor. - codex: ships tool_uses from the transcript per turn; mark the turn's UserPromptSubmit anchor shipped after a successful send and skip resend. - cursor: groups by generation_id; mark the generation's stop entry shipped and skip if a stop fires twice for the same generation. _persist_*_flags re-read the log before stamping to avoid clobbering entries a concurrent hook appended. Tests: new test_stop_dedup.py in each hook dir reproduces the bug (multiple Stops in one turn) and asserts each tool_use ships exactly once, plus that the get_recent_user_prompts_for_session lookback still returns prior prompts (what 16d1ffe protected). Verified red without the fix, green with it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-15T20:57:51Z

+def _persist_shipped_flags(session_id: Optional[str], shipped_logs: List[Dict]) -> None:
+    """Re-read the audit log and stamp SHIPPED_FLAG on the entries that were
+    just shipped, then rewrite it. Re-reading (instead of saving our stale
+    in-memory copy) avoids clobbering entries a concurrent hook appended after
+    we loaded the log."""
+    if not shipped_logs:
+        return
+    shipped_ids = {_log_identity(log) for log in shipped_logs}


The session_id parameter is accepted but never referenced inside the function body. _log_identity already embeds the session ID in the identity tuple, so the match is implicitly session-scoped. The dead parameter adds confusion about whether session-scoping is intended as an extra guard and makes callers think they need to supply it. Consider dropping it from the signature (and the call-site).

Suggested change

def _persist_shipped_flags(session_id: Optional[str], shipped_logs: List[Dict]) -> None:

"""Re-read the audit log and stamp SHIPPED_FLAG on the entries that were

just shipped, then rewrite it. Re-reading (instead of saving our stale

in-memory copy) avoids clobbering entries a concurrent hook appended after

we loaded the log."""

if not shipped_logs:

return

shipped_ids = {_log_identity(log) for log in shipped_logs}

def _persist_shipped_flags(shipped_logs: List[Dict]) -> None:

"""Re-read the audit log and stamp SHIPPED_FLAG on the entries that were

just shipped, then rewrite it. Re-reading (instead of saving our stale

in-memory copy) avoids clobbering entries a concurrent hook appended after

we loaded the log."""

if not shipped_logs:

return

shipped_ids = {_log_identity(log) for log in shipped_logs}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-15T20:57:52Z

        usage=transcript_usage,
    )

-    if exchange:
-        send_to_api(exchange, api_key)
+    if exchange and send_to_api(exchange, api_key):
+        # Persist the shipped markers so later Stops in this turn skip what was
+        # already sent. The entries stay in the log for the user-prompt lookback.
+        for log in pending_tool_logs:
+            log[SHIPPED_FLAG] = True
+        if prompt_log is not None:
+            prompt_log[SHIPPED_FLAG] = True
+        _persist_shipped_flags(session_id, pending_tool_logs + ([prompt_log] if prompt_log else []))




TOCTOU race between concurrent Stops

The guard read (prompt_log.get(SHIPPED_FLAG)) and the flag write (_persist_shipped_flags) are not atomic. If two Stop events fire within the same small window — both load the log before either has finished writing — both will see an unshipped state, both pass the guard, and both call send_to_api, shipping the same exchange twice. The re-read inside _persist_shipped_flags only protects against a third process appending new lines, not against two racing Stops both clearing the guard simultaneously. For the common sequential-Stop case this is fully correct; the risk is limited to genuinely concurrent invocations, but it's worth documenting (or protecting with a lock/tmp-rename swap) so the behavior is explicit.

greptile-apps · 2026-06-15T20:57:53Z

+    # Idempotency guard: Codex fires a Stop per agent turn, so a single user
+    # turn produces several Stop events. Once this turn's exchange has shipped,
+    # re-sending would re-count the same tool_uses (root cause of inflated
+    # policy-match counts), so skip it.
+    if prompt_log is not None and prompt_log.get(SHIPPED_FLAG):
+        return


Turn-level flag silently drops tool_uses added after the first Stop

Codex marks the UserPromptSubmit anchor shipped as soon as Stop #1 succeeds. If Stop #2 fires and the transcript now contains tool_uses that weren't present when Stop #1 ran (e.g., the agent performed a file edit between the two sub-turns), those new tool_uses are never shipped — the guard returns early without inspecting the transcript again. The claude-code hook avoids this by checking per-PostToolUse entries. If multi-Stop, multi-tool-use codex turns are possible in practice, consider tracking shipped transcript offsets or tool indices rather than a single anchor flag.

vigneshsubbiah16 requested a review from a team June 15, 2026 20:52

greptile-apps Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hooks: ship each tool_use once per Stop turn (fix inflated policy-match counts)#157

hooks: ship each tool_use once per Stop turn (fix inflated policy-match counts)#157
vigneshsubbiah16 wants to merge 1 commit into
mainfrom
fix/web-stop-event-duplicate-tool-use

vigneshsubbiah16 commented Jun 15, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vigneshsubbiah16 commented Jun 15, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix — restore the prune safely (mark, don't delete)

Before / after

Test plan

Follow-up (not in this PR)

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vigneshsubbiah16 commented Jun 15, 2026 •

edited by greptile-apps Bot

Loading