Skip to content

WEB-4812: codex hook falls back to ~/.unbound/config.json for API key#137

Open
AvutR wants to merge 1 commit into
mainfrom
WEB-4812-codex-hook-api-key-config-fallback
Open

WEB-4812: codex hook falls back to ~/.unbound/config.json for API key#137
AvutR wants to merge 1 commit into
mainfrom
WEB-4812-codex-hook-api-key-config-fallback

Conversation

@AvutR

@AvutR AvutR commented Jun 12, 2026

Copy link
Copy Markdown

Root cause (WEB-4812, Slack)

codex/hooks/unbound.py resolved its API key only from the UNBOUND_CODEX_API_KEY env var (main(), was line 1523). Setup writes that var to ~/.zshrc/~/.bashrc (user-level) or per-user rc files (MDM set_env_var_system_wide), so any codex process that doesn't inherit a freshly-sourced zsh/bash environment runs keyless:

  • VS Code extension host / GUI-launched codex
  • terminals opened before setup ran (matches Dinesh's "sessions started before the script ran never log")
  • fish/other-shell users

With no key, the failure is triply silent by design: send_to_api no-ops (exchange dropped), pretool policy checks fail open to allow, and report_error_to_gateway itself requires the key — so nothing reaches the gateway or Sentry. The only trace is No API key present in send_to_api function in ~/.codex/hooks/error.log.

This is sibling drift: Claude Code hit the identical bug and fixed it in WEB-4145 (#56) with a two-tier get_api_key(); cursor (cursor/unbound.py:1182) and copilot (copilot/hooks/unbound.py:938) have the same fallback. Codex was the only tool never ported — even though this same file already reads api_key from ~/.unbound/config.json for the discovery gate and MCP scan dispatch, and both setup paths write that file.

Fix

Port get_api_key() (env → ~/.unbound/config.json) to codex and use it in main(). Two-line behavioral change + docstring.

Evidence (sandboxed repro, fully offline)

Ran the real hook script with fake HOME, gateway at a closed port, UserPromptSubmit + Stop events on stdin:

env var config.json result (error.log)
before unset has key No API key present in send_to_api function — exchange dropped
before set API request failed: curl (7) — send attempted ✓
after unset has key API request failed: curl (7)send attempted with config key

Tests

  • New codex/hooks/test_get_api_key.py (6 tests): env-wins, config fallback, no-source → None, invalid JSON → None + logged, missing field → None, and a main()-level Stop-event regression asserting the exchange is sent with the config key when the env var is absent.
  • test_identity.py 33/33 pass; binary/tests 73 passed/31 skipped (vendored-contract unaffected); py_compile + pyflakes undefined-name gate clean (CI parity).

Rollout notes

  • User-level installs self-heal: SELF_UPDATE_URL points at raw main, checked on SessionStart every ~2h — no reinstall needed once merged.
  • MDM python-era installs skip self-update by design (Skip hook self-update when running from a managed/enterprise location (MDM) #117) — they pick this up on next onboard run.
  • unbound-hook binary vendors this file — included in the next release-macos-runtime build.
  • Side effect worth knowing: this also restores pretool policy enforcement in the keyless contexts (it was failing open), and re-arms gateway error reporting there.

🤖 Generated with Claude Code

Greptile Summary

This PR ports the two-tier API key lookup (env var → ~/.unbound/config.json) from Claude Code / cursor / copilot into the codex hook, fixing a silent-drop regression where any codex process launched outside a sourced shell (VS Code extension host, GUI launchers, pre-setup terminals) never received an API key and silently dropped every exchange.

  • Adds get_api_key() with env-var-first, config-file-fallback resolution; handles FileNotFoundError, JSONDecodeError, and a general Exception guard, logging errors locally on decode failures.
  • Replaces the bare os.getenv(...) call in main() with get_api_key(), and ships 6 tests (unit + a main()-level Stop-event regression) confirming the fallback path is exercised end-to-end.

Confidence Score: 4/5

Safe to merge — the two-line behavioral change is isolated to key resolution, the fallback path is well-tested, and it correctly restores behaviour for the affected launch environments.

The core fix is a faithful port of an already-proven pattern. The one observation is that log_error inside get_api_key() fires while _cached_api_key is still None, so config-parse errors are written locally to error.log but won't be forwarded to the gateway. This is an edge-case observability gap rather than a correctness problem.

No files require special attention; the change is self-contained in codex/hooks/unbound.py and fully covered by the new test file.

Important Files Changed

Filename Overview
codex/hooks/unbound.py Adds get_api_key() with env-var-first, config-file-fallback lookup; main() swapped to call it. Logic is correct and matches the pattern used in claude/cursor/copilot. Minor: log_error inside get_api_key() fires while _cached_api_key is still None, so gateway reporting of config-parse errors won't reach the gateway (local error.log is still written).
codex/hooks/test_get_api_key.py New test file with 5 unit tests covering env-wins, config-fallback, missing file, malformed JSON, and missing field. One integration-style regression test exercises main() with a Stop event to confirm the full send path uses the config key when env is absent.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["main() called"] --> B["get_api_key()"]
    B --> C{"UNBOUND_CODEX_API_KEY set?"}
    C -- yes --> D["return env key"]
    C -- no --> E{"config.json exists?"}
    E -- no --> F["return None"]
    E -- yes --> G{"Valid JSON with api_key?"}
    G -- yes --> H["return config key"]
    G -- "malformed JSON" --> I["log_error, return None"]
    G -- "missing field" --> J["return None"]
    G -- "other OSError" --> K["log_error, return None"]
    D --> L["_cached_api_key = api_key"]
    H --> L
    F --> L
    I --> L
    J --> L
    K --> L
    L --> M{"api_key present?"}
    M -- yes --> N["send_to_api succeeds"]
    M -- no --> O["send_to_api no-ops, silent drop"]
Loading

Reviews (1): Last reviewed commit: "WEB-4812: codex hook falls back to ~/.un..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

The codex hook resolved its API key ONLY from the UNBOUND_CODEX_API_KEY
env var, which setup writes to ~/.zshrc/.bashrc. Codex launched from VS
Code's extension host, GUI launchers, terminals opened before setup ran,
or non-zsh/bash shells never sees that var, so every exchange was
silently dropped: send_to_api no-ops, policy checks fail open, and
report_error_to_gateway needs the key too — nothing reaches Sentry or
the gateway. Only trace is 'No API key present in send_to_api function'
in ~/.codex/hooks/error.log.

Claude Code hit this exact bug and fixed it in WEB-4145 with a two-tier
get_api_key(); cursor and copilot have the same fallback. Codex was the
only tool never ported — this file already reads api_key from
~/.unbound/config.json for the discovery gate and MCP scan dispatch.

Port get_api_key() to codex (env first, then ~/.unbound/config.json,
which both user-level setup.py and MDM write) and use it in main().

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AvutR AvutR requested a review from a team June 12, 2026 22:11
Comment thread codex/hooks/unbound.py
Comment on lines +1143 to +1146
return None
except Exception as e:
log_error(f"Failed to read config file: {e}", 'config')
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 log_error fires before _cached_api_key is populated

log_error internally calls report_error_to_gateway(message, category, _cached_api_key). When get_api_key() is invoked at the top of main(), the module-level _cached_api_key is still None (it's only updated on the next line after get_api_key() returns). So if the config file contains malformed JSON, the error is correctly written to error.log, but gateway reporting silently no-ops because the key is None. In this particular failure mode the key was unreadable anyway so there is no additional loss, but the ordering means gateway-side visibility for startup config errors will remain limited until the key is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant