fix(code-review-beta): secure Codex review delegation by davidalee · Pull Request #8 · davidalee/compound-engineering-plugin

davidalee · 2026-05-04T20:57:14Z

Summary

Adds ce-code-review-beta, a beta skill that mirrors stable ce-code-review and optionally delegates mid-tier persona reviewers to codex exec. Goal: keep the existing review quality model on large fan-outs while reducing token pressure on the orchestrating session.

Stable ce-code-review is unchanged, as are upstream callers (ce-work, lfg, ce-polish-beta).

Review Lane Split

Always local (8): correctness, security, adversarial, previous-comments, agent-native, learnings, schema-drift, deployment-verification. These carry the highest correctness/security risk, need authenticated GitHub context, or return prose/checklists outside the delegated JSON contract.
Delegation-eligible (13): testing, maintainability, project-standards, api-contract, data-migrations, reliability, performance, dhh-rails, julik-frontend-races, kieran-python, kieran-rails, kieran-typescript, swift-ios. All 13 persona files are vendored into references/delegated-personas/ so the skill is self-contained after plugin conversion.

What's in the Diff

plugins/compound-engineering/skills/ce-code-review-beta/ — new skill: SKILL.md, BETA-STATUS.md, references for delegation workflow, persona catalog, schema, output template, walkthrough, bulk-preview, diff-scope, tracker-defer, validator template, subagent template, and 13 vendored persona files.
scripts/ — three executable helpers: resolve-base.sh (fork-safe base-branch resolution), integrity-check-config.sh (config trust check), trust-check-codex.sh (Codex binary trust check).
plugins/compound-engineering/README.md — skill-count bump only.
tests/review-skill-contract.test.ts — +562 lines covering lane split, delegated-persona byte-parity, prompt escaping, integrity gate ordering, mode behavior, binary trust, isolated Codex home, fixed launch env, exit sentinels, cancellation, ignored late results, max-parallel cap, and schema parity.

No release-owned versions, manifests, or changelogs touched.

Behavior

Mode gating. mode:headless fails fast with a machine-readable error envelope when consent or prechecks are missing. mode:autofix falls back to local. mode:report-only stays local. Only Interactive mode shows the consent prompt.

Config trust boundary. Pre-resolution emits only a sentinel; runtime config is read only after integrity-check-config.sh passes (symlink rejection on .compound-engineering/ and config.local.yaml, regular-file requirement, gitignore coverage, not-tracked-by-git, resolved-path-stays-inside-root). Exit code matches the prefix so set -e callers fail closed.

Codex binary trust. trust-check-codex.sh canonicalizes the path with a portable fallback chain and rejects any remaining symlink, shell metacharacters, repo/scratch paths, world-writable parents, and bins that fail a 10-second --version smoke probe under the same env -i shape used at launch (with proxy vars hard-disabled). Catches nvm/asdf shim and TTY-blocking failures up front.

Trust-script invocation. Both trust scripts are invoked via bash "${CLAUDE_SKILL_DIR}/scripts/...", never bare relative paths — the Bash tool's CWD is the reviewed repo, so a bare path could resolve to a script planted by the reviewed PR before the trust check ran.

Prompt-injection mitigation. Persona text, PR metadata, intent summaries, file lists, and diffs are XML-escaped before insertion. Raw placeholders ({diff}, {persona_content}, {pr_metadata}) are explicitly rejected. Delegated prompts are forbidden from reading home, ~/.codex, auth, history, sessions, rules, or plugin state.

Self-Review Integrity Gate. Stage 4 checks for diff modifications to delegated personas, the workflow file, schema, scripts, scope rules, or subagent templates before reading any of those files; if matched, delegation is disabled for the run. Vendored persona files are tested byte-for-byte against the canonical agent source.

Runtime isolation. Per-run CODEX_HOME under scratch (chmod 700). auth.json is the only file copied (chmod 600) and only after source ownership/mode are verified. Launches use env -i with fixed PATH/HOME/CODEX_HOME, --ignore-user-config, --ignore-rules, --cd "$REPO_ROOT", and -s read-only.

Process lifecycle. PIDs captured per reviewer; setsid forms a process group when available with a portable pkill -P fallback for systems without it (e.g., macOS). Cancellation sends SIGTERM then SIGKILL. Result files written via mv -f from a tmp path; exit-code sentinels written after sync. Stage 5 merge consumes only the in-memory status map — ignored reviewers' on-disk files do not enter merge.

Credential cleanup. Trap on INT/TERM (not EXIT) wipes copied auth.json on signal exit. EXIT is excluded because CODEX_HOME is shared across reviewers and the first reviewer's normal exit would otherwise wipe credentials out from under siblings still running. End-of-run cleanup handles the happy path.

Fan-out cap. review_delegate_max_parallel (default 4, hard maximum 16) bounds the wave-based scheduler. Out-of-range values fall back to the default — the cap is a safety control, not a tuning knob.

Verification

bun test: 1292 pass, 0 fail.
bun run release:validate: in sync; compound-engineering reports 49 agents, 39 skills, 0 MCP servers.

Suggested Reviewer Focus

Stage 3c → Stage 4 ordering (Self-Review Integrity Gate before persona/workflow reads).
The lane split, especially keeping previous-comments local.
Prompt escaping and raw-placeholder rejection.
Trust scripts, isolated CODEX_HOME, credential cleanup, fixed launch env.
Timeout, cancellation, ignored late results, and local redispatch on circuit-breaker trip.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…e-path routing (EveryInc#747) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…ewers Introduces a beta skill that mirrors ce-code-review but routes the mid-tier persona reviewers (testing, maintainability, project-standards, plus cross-cutting and stack-specific conditionals) to `codex exec` instead of the platform's subagent primitive, conserving session tokens on a workflow that fans out 6-10+ reviewers per run. High-stakes reviewers (correctness, security, adversarial) stay on the session model so capability is not lost where it matters most. Unstructured-output agents (agent-native, learnings-researcher, schema-drift-detector, deployment-verification-agent) also stay local -- they return prose / checklists, not the findings JSON shape that `--output-schema` enforces. Hardcoded `-s read-only` sandbox: persona reviewers don't write project files, run tests, or build, so workspace-write was YAGNI. Read-only empirically permits read-oriented git/gh evidence-gathering commands. Design hardening from a two-lane (Claude + Codex MCP) adversarial pre-ship review: - Stage 3c persona-file resolution (CLAUDE_PLUGIN_ROOT primary, repo-relative fallback) so delegated `<persona>` blocks can never ship empty - Preflight one delegated reviewer first; if it fails, disable delegation and run all reviewers locally -- avoids the parallel-launch failure cascade that would defeat a post-hoc circuit breaker - Explicit Stage 4 -> Stage 5 barrier on per-reviewer status map; merge cannot start while any reviewer is still pending - mode:report-only suppresses delegation silently (incompatible with the required scratch/artifact writes); mode:headless without recorded consent fails fast with a structured error envelope rather than silently degrading - validate-then-write-full-then-strip ordering for the JSON contract; reversing it would silently empty headless detail-enrichment Beta is `disable-model-invocation: true` -- manual invocation only. Stable ce-code-review and upstream callers (ce-work, lfg, ce-polish-beta) are unchanged.

…EveryInc#772) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…veryInc#770)

Harden Codex delegation against unsafe local config, mutable self-review prompt files, shell interpolation, prompt injection, and late background results. Keep GitHub-auth-dependent reviewers local and copy delegated persona sidecars into the beta skill so installed and converted runs are self-contained.

…veryInc#775) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Apply concrete fixes from PR #8 review: - F3: resolve `scripts/resolve-base.sh` via `${CLAUDE_SKILL_DIR}` with bare-path fallback so the call works when the Bash tool runs from the user's project CWD rather than the skill directory. Applied to both stable and beta. - F4: rewrite `{output_contract}` substitution rule to override the compact-only return paragraph in subagent-template.md and align with the `<constraints>` block, so delegated reviewers return FULL JSON (with why_it_matters and evidence) — preventing silently empty Why:/Evidence: lines in headless output. - F14: standardize the gate name as "Self-Review Prompt Integrity Gate" across SKILL.md and the workflow reference; reduce the workflow's section 0b to a one-line back-reference to SKILL.md Stage 4 (the authoritative spec). - F15: add `additionalProperties: false` at root and items in findings-schema for both stable and beta; declare `_meta` as a root property so the existing documentation block remains valid. - F16: replace ".context/ artifact path" with the OS-temp path declared in the same output contract, matching the run-artifact contract. Both copies updated. - F22: append "seconds" to the delegate_timeout_seconds default doc. - F9: scratch-cleanup scope guard. Add explicit assertion `[ -n "$SCRATCH_DIR" ] && [ "$CODEX_HOME" = "$SCRATCH_DIR/codex-home" ]` before any rm of codex-home or auth.json; document SCRATCH_DIR as immutable. - F23: add `Skill: ce-code-review-beta` (and `Skill: ce-code-review` for stable) discriminator line to the headless envelope so callers can gate skill-specific reason-string parsing. - F2 (sub-fix only): add `scripts/resolve-base.sh` to the protected-paths list guarded by the Self-Review Prompt Integrity Gate. - F7: write codex result and exit-code sentinel via tmp-file + atomic mv, with `sync` between, so polling readers never see a partial write or a sentinel that implies a result not yet durable. - F8: hard wall-clock guard inside the polling loop body; document that the polling Bash call inherits the harness's foreground timeout. - F5/F6 (highest-value tests): add compact-split-ordering and circuit-breaker contract tests to tests/review-skill-contract.test.ts. - F10: add stable/beta sidecar parity test to flag silent drift in shared reference files. Updated three existing review-skill-contract tests to match the renamed gate and atomic-rename launch template. Routed to failed bucket (need design decisions or new feature work): F1, F2 (broader deterministic-gate piece), F5/F6 (broader test gap), F11, F12, F13, F18, F19, F20, F21, F24. bun test: 1290 pass, 0 fail. bun run release:validate: in sync.

- chmod 700 on the per-run /tmp artifact directory in stable + beta so per-reviewer findings JSON is owner-only, not world-readable. mkdir -p inherits umask 022 (755), which leaves evidence and delegated-reviewer output exposed to other local users/processes. - Tighten Stage 5 merge-queue rule: populated only from the in-memory status map (status=succeeded), never by re-scanning the scratch directory. Closes the bypass where an `ignored` reviewer's result file could leak into merge despite cancellation. - Tag post-circuit-breaker local-fallback failures distinctly in Coverage so users can distinguish original delegation failures from local re-dispatch failures. - Strengthen Codex-binary smoke-check to use the same `env -i` shape as the actual delegated launch (no TERM, no SHELL, no terminfo). Catches TTY-blocking CLI builds with one short probe instead of waiting out the full review_delegate_timeout_seconds (default 900s). - Align prose terminology with directory name: "delegated personas" / "delegated persona files" instead of "persona sidecars" / "delegated sidecar files". Renames the parity-test describe block to match. Directory `references/delegated-personas/` is unchanged. Stable/beta sync decision: chmod 700 propagated to stable; the Stage 5 barrier, smoke-check, and Coverage tagging changes are beta-only because they apply to the delegation workflow that exists only in beta.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ist items (EveryInc#766)

…nd form (EveryInc#758) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…tches (EveryInc#759)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Address findings from the multi-persona PR review (security, swe, architect, qa, devops). Highest-impact fixes are process lifecycle and credential containment in the delegated lane: - Real PID capture + setsid + SIGTERM/SIGKILL cancellation. The prior Bash-tool handle could not kill orphan codex processes. - trap-based auth.json cleanup on EXIT/INT/TERM so credentials never linger in /tmp on Ctrl-C, OOM, or orchestrator crash. - Wave-based fan-out cap via review_delegate_max_parallel (default 4) to bound workstation memory/CPU under N-reviewer fan-out. - Captured stderr per delegated reviewer so failure messages are durable instead of "exit code 1". - chmod 600 on copied auth.json, chmod 700 on every parent of the per-skill scratch path; auth source ownership/mode verified. - NO_PROXY/HTTP_PROXY hard-disabled in the binary smoke probe so a future telemetry call would fail loudly instead of silently leaking. Security checks extracted to scripts so the contract is testable, not just prose: scripts/trust-check-codex.sh and scripts/integrity-check-config.sh. The prose contract is kept alongside the script invocation so future edits can cross-check; tests/review-skill-contract.test.ts now exercises the integrity check behaviorally with fixture filesystems. resolve-base.sh: switched awk to `-v` interpolation, captured fetch stderr into the error envelope so callers can distinguish auth/network/ no-such-branch failures, and added --pr-base-repo / --pr-base-branch flags so PR-mode and standalone mode share one tested code path. The former inline 20-line PR-mode bash in SKILL.md is gone. Findings schema (stable + beta, parity preserved): added $id and an optional top-level schema_version field plus _meta.schema_version "1.0.0" with a written version policy. evidence.minItems dropped from 1 to 0 so reviewers with no specific snippet don't fabricate one. persona-catalog (stable + beta, parity preserved): added a Lane column declaring each reviewer's local-vs-delegation-eligible status, plus a "Lane assignment policy" section so future reviewers don't drift into the wrong lane by default. SKILL.md: added BETA-STATUS pointer, confidentiality warning, documented model allowlist (gpt-5-codex, gpt-5, o4-mini, gpt-5-mini), single-source-of-truth Self-Review Prompt Integrity Gate with explicit trigger globs, max-parallel cap doc, Stage 5 step renumbering (6/6b/6c -> 6/7/8/9), and converted the heterogeneous-fixer-queue prose into a routing table. Trusted config path is re-derived at runtime; the pre-resolved sentinel path is documented as informational only. BETA-STATUS.md: new doc capturing graduation criteria, sunset criteria, telemetry plan, and removal procedure (including the STALE_SKILL_DIRS / EXTRA_LEGACY_ARTIFACTS_BY_PLUGIN registry entries that must be added when the skill is deleted). Tests: 8 new contract tests covering the gate's trigger globs, env scrub, codex-home failed-check name, max-parallel cap presence, BETA-STATUS required sections, script existence and executable bits, behavioral integrity-check (symlink/tracked/gitignore rejection), and schema versioning. Catalog parser regex updated for the Lane column. Skipped with explicit rationale: disable-model-invocation kept (beta convention); local-redispatch on unconfirmed cancellation kept fail-closed (documented as future work in BETA-STATUS); persona-sha256 hashing, dry-run mode, lockfiles, and combined CODEX_DELEGATION.md left out of scope. bun test: 1298 pass, 0 fail. bun run release:validate: clean.

…rkflow Reduce skill body multiplicative cost (per docs/solutions/best-practices/codex-delegation-best-practices-2026-04-01.md): 1. Extract delegation-conditional content from SKILL.md (1076 -> 960 lines) to references/codex-delegation-workflow.md: settings resolution, mode interaction, persona file mapping, model override, delegated dispatch, and JSON return contract. Leave compact stubs at each extraction point. Self-Review Prompt Integrity Gate and lane-split argument parsing stay inline since they fire on every invocation. 2. Restructure codex-delegation-workflow.md: dedup mode handling, lane split, JSON return contract, sandbox rationale, and mixed-model attribution. Convert circuit-breaker action and cancellation-failure handling to step-by-step lists. Dedup the gate-must-pass-before-persona- read rule from 5 restatements to 2. Front-load the Model Override Opus cost warning. Replace prose integrity-check duplication with calls to scripts/integrity-check-config.sh. Update tests/review-skill-contract.test.ts to follow content moves and restructured assertions. No behavior change. Stable ce-code-review/ and parity-enforced shared references untouched.

…pe (EveryInc#780) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eryInc#782)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

resolve-base.sh: guard $1 reference under set -u so the parser doesn't abort with "unbound variable" when all positional args were consumed. trust-check-codex.sh: add python3 and perl canonicalization fallbacks before the dirname-only last resort, and explicitly reject a symlinked final component so a symlinked launcher cannot pass the world-writable and repo/scratch checks against the symlink's parent.

…it codes The script previously printed ERROR:<reason> but exited 0 for every branch, so callers using `set -e` or simple `script || handle_error` would silently treat a failed integrity check as success. Align exit codes with the prefix: OK and ABSENT exit 0; every ERROR branch exits 1. Update the test runner to capture stdout regardless of exit status so the ERROR-prefix assertions still pass under the new failing-exit behavior.

Three independent post-review hardenings to the delegation workflow doc, each with matching contract-test coverage: - Resolve helper scripts via ${CLAUDE_SKILL_DIR} instead of bare relative paths. The Bash tool's CWD is the reviewed repo, so a bare invocation could either fail to find the script or execute a malicious script of the same name planted by the reviewed PR before any trust check runs. - Document setsid as not-shipped on macOS base userland and prescribe a PG_PREFIX probe: when present, prefix `setsid` and cancel via `kill -SIGNAL -PID` (process group); when absent, omit the prefix and cancel via `kill -SIGNAL PID` plus `pkill -P` sweep. Replace the EXIT trap on auth.json with an INT/TERM-only trap so a finishing reviewer doesn't yank credentials from siblings still running in the same wave. - Cap review_delegate_max_parallel at a hard maximum of 16 and fall back to the default for non-integer, non-positive, or oversized values. The cap is a safety control on local resources and Codex spend, not just a tuning knob.

The merge from main converted the ce-code-review-beta skill mention into a markdown link pointing at docs/skills/ce-code-review-beta.md, but no such doc exists. Keep the skill mention; drop the link.

PR EveryInc#779 added pipe-escaping guidance to the stable copy of references/review-output-template.md but the beta copy didn't pick it up through the main->branch merge. The shared-reference parity test was failing because of the drift; copying stable's content over beta's restores byte parity and unblocks CI.

…ote matching - Remove CWD fallback from RESOLVE_SCRIPT in all three scope modes (PR, branch, standalone). After gh pr checkout the CWD is the reviewed repository; a malicious PR could plant scripts/resolve-base.sh there and intercept execution. Fail closed with an explicit error when CLAUDE_SKILL_DIR is unset instead. - Add boundary check to awk remote-matching in resolve-base.sh so "org/repo" does not spuriously match "org/repo-extra". Uses a tail_ok() helper that asserts the URL ends with the repo path followed by .git or end-of-string.

…e.sh Codex review on PR EveryInc#784 flagged that resolve-base.sh hard-coded `https://github.com/...` PR URL parsing and `github.com:` / `github.com/` remote-URL substring matching. On GitHub Enterprise (or any non-github.com host) PR_BASE_REPO came back empty and remote matching was skipped, silently falling back to `origin` — which in fork workflows points at the user's fork, producing the wrong merge base. Replace the hard-coded patterns with two sourced shell helpers: - `parse_pr_url` extracts (host, owner/repo) from any HTTPS PR URL, anchored on `/pull/<N>` from the right so path-prefixed GHE deployments fail-closed to the fallback chain rather than silently producing "github/org" as owner/repo. Userinfo and port are stripped; host and owner/repo are lowercased so GitHub's case-insensitive identifiers compare correctly. - `parse_remote_url` extracts (host, owner/repo) from `https://`, `ssh://`, and scp-form (`git@host:org/repo`) URLs. Strips userinfo, port, and trailing `.git`. Replace the awk substring matcher with a `while` loop over `git remote get-url <name>` so url.*.insteadOf rewrites are honored. Comparison is exact equality on lowercased (host, owner/repo). New `--pr-url` flag lets callers pass the full PR URL from `gh pr view --json url`; SKILL.md is updated to use it instead of manually extracting the repo segment. `--pr-base-repo` is retained but now requires a paired `--pr-base-host`. Adds 15 tests in tests/resolve-base-beta-script.test.ts covering both helpers directly (sourced via RESOLVE_BASE_SOURCE_ONLY=1) and end-to-end GHE-host resolution against fork-origin setups. Stable/beta sync: not propagating to stable in this PR per user direction — stable `ce-code-review/scripts/resolve-base.sh` has the same hard-coding and will be addressed in a follow-up. Known limitations (documented in script header): scp-form URLs with bracketed IPv6 hosts are not parsed; GHE deployments mounted under a path prefix fail-closed to the fallback chain.

…c#803 description Upstream PR EveryInc#803 (merged 2026-05-08) trimmed the ce-swift-ios-reviewer description by ~25%. PR EveryInc#784 was last updated before that merge, so its delegated-personas/ snapshot held the pre-trim copy. The byte-equality test review-skill-contract > "maps every delegated reviewer id ..." started failing on the next CI run after EveryInc#803 landed (the merge ref now has the SHORT master from main while the PR branch still had the LONG delegated copy). Resync both the master agent file and the delegated copy on this branch to the post-EveryInc#803 SHORT description so PR head and merge ref stay internally consistent. No behavior change.

- keep PR URL and URL-form remote ports during beta resolve-base matching - allow scp-form SSH remotes to match ported PR hosts without the web UI port - add beta resolver regressions for ported GHE URL-form and scp-form remotes

…robe timeout Two P1 review findings from Codex on PR EveryInc#784, fixed together with widened scope to address the underlying class of each issue. resolve-base.sh — fail closed when PR base is unreachable Previously, when PR metadata (--pr-url or --pr-base-repo/host) identified a specific PR base repo, two failure paths silently fell back to origin: - A matched remote was found but its fetch failed (Codex's narrow ask). - No remote matched the parsed (host, owner/repo) at all. Both produce the same wrong outcome: merge-base computed against the fork's or an unrelated repo's history, silently mis-scoping the reviewed diff. The fix introduces a PR_METADATA_PROVIDED gate that emits ERROR for both sub-cases when PR metadata was supplied. The origin/local fallback remains unchanged for auto-detect / branch-mode invocations where no specific base was named. The previous test 384 ("boundary: org/repo-extra is NOT equal to org/repo") asserted that origin fallback succeeded after the matcher correctly rejected the fuzzy match. That collateral assertion encoded the silent-fallback bug; it is replaced with an ERROR assertion that still catches a fuzzy-match regression (which would yield BASE: instead of ERROR:). A new test exercises the matched-remote-fetch-fails case that Codex flagged directly. trust-check-codex.sh — portable timeout strategy The original `timeout 10 ...` inside env -i fails on default macOS, where `timeout` ships in neither /usr/bin nor any scrubbed-PATH dir unless coreutils was installed via Homebrew. Codex rejected otherwise- valid Codex binaries on those hosts. The fix resolves a timeout strategy in the parent shell before entering env -i and passes an absolute path through. Fallback chain: 1. `timeout` (GNU coreutils) 2. `gtimeout` (Homebrew coreutils) 3. `perl` with fork + setpgrp + alarm + kill-pgroup, matching GNU timeout's SIGTERM-then-SIGKILL process-group semantics so probes against bash-wrapped or multi-process codex builds terminate reliably. A naive `perl -e 'alarm; exec'` was insufficient — bash internally swallows SIGALRM (for `read -t`), and `kill <pid>` does not signal descendants. The implemented form forks, calls setpgrp(0, 0) in the child to create a new process group, then on SIGALRM kills the negative pid (whole group). Exit codes match GNU timeout (124 on timeout). CE_PROBE_TIMEOUT_SECS env var overrides the default 10s for tests. When none of the three are available, the script emits an actionable ERROR (was: silent fail). Tests - tests/resolve-base-beta-script.test.ts: rewritten boundary test + new matched-remote-fetch-fails test. - tests/trust-check-codex-script.test.ts (new): 5 behavioral tests covering each tier of the fallback chain plus the all-absent ERROR. Behavior change notes - Auto-detect invocations (no flags) that previously silently used origin when the local checkout had no remote matching the PR base now ERROR with guidance to add the upstream remote. This was the silent-fallback bug being fixed. - Stable ce-code-review/scripts/resolve-base.sh has the equivalent pattern but is intentionally out of scope for this PR per prior scoping decision in PR EveryInc#784 threads. Refs: PR EveryInc#784 review threads 3215939036, 3216102821, 3216102828.

…heck test sandbox under HOME Two follow-ups on the Codex review of d87ab1a. **`${CLAUDE_SKILL_DIR:-.}` portable pattern (P1).** The prior fix to strip the CWD-relative fallback for `resolve-base.sh` and the two trust scripts made the beta skill fail closed on converted Codex/Gemini targets, since `CLAUDE_SKILL_DIR` is Claude-Code-only. Adopt the `${CLAUDE_SKILL_DIR:-.}` form already used by `ce-worktree`: on Claude Code the variable is reliably set and resolves to the absolute skill directory; on Codex, Gemini, and similar harnesses whose Bash CWD is the skill directory, the `:-.` fallback yields the bare relative path those harnesses natively resolve. The original security property — preventing a malicious PR from planting `scripts/resolve-base.sh` in the reviewed-repo CWD — still holds on Claude Code, because the `:-.` branch only activates when the variable is unset, which is the signal that we are NOT on Claude Code (where CWD would be the reviewed repo after `gh pr checkout`). Applied to all five invocation sites: three `resolve-base.sh` blocks in SKILL.md (PR mode, branch mode, standalone mode) plus the `integrity-check-config.sh` and `trust-check-codex.sh` invocations in `codex-delegation-workflow.md`. **Test sandbox anchored under HOME (P2).** `tests/trust-check-codex-script.test.ts` was creating the fake codex binary under `os.tmpdir()`, which resolves to `/var/folders/.../T/` on macOS (passes the trust check's world-writable walk) but to `/tmp` on Linux (rejected by the literal `/tmp/*` case in `trust-check-codex.sh`). Three of five tests were green locally on macOS but red on Linux CI — the success-path tests never reached the timeout fallback behavior they were intended to exercise. Anchor the sandbox under `$HOME/.cache/ce-trust-check-tests/` (mode 0700) so the parent walk never hits `/tmp` and the trust check passes through to the smoke probe on every platform. Manually verified that the script still rejects `/tmp/...` paths when invoked directly, so the security property is preserved. **Contract test update.** `tests/review-skill-contract.test.ts` was pinning the strict `${CLAUDE_SKILL_DIR}/scripts/integrity-check-config.sh` form. Update it to assert the new portable `${CLAUDE_SKILL_DIR:-.}/scripts/...` shape with a comment explaining the threat model the new form preserves. The bare-relative invocation regex still rejects unprefixed `bash scripts/...` invocations in active lines.

…data In resolve-base.sh Step 1, when `gh pr view` returns a non-empty baseRefName but either no URL or a URL that parse_pr_url rejects (e.g., GHE deployments mounted under a path prefix), the resolver previously left PR_BASE_HOST/REPO unset and silently fell through to the legacy origin/local fallback -- computing merge-base against fork history in fork workflows and silently miscategorizing the reviewed diff. Same bug class d87ab1a closed for matched-remote-fetch-fails and no-matching-remote (Codex P1 r3221447092). Third trigger is gh-returned-but-unestablishable PR metadata: gh said we're on a PR but we couldn't extract the base repo from its URL field. Fail closed with ERROR and explicit guidance to pass --pr-url or --pr-base-repo/host/branch. Restructure: defer REVIEW_BASE_BRANCH assignment until URL parse succeeds. Empty META_URL after a non-empty baseRefName -> ERROR. parse_pr_url rejection -> ERROR. When gh returns nothing (PR_META empty or baseRefName empty), preserve legacy auto-detect fallthrough to Steps 2-4. Updated the header "Limitations" note for GHE path-prefix deployments to reflect the new behavior (fail-closed ERROR in auto-detect, not silent fallback chain). Tests (tests/resolve-base-beta-script.test.ts): - auto-detect unparseable PR URL -> ERROR, no origin fallback - auto-detect base branch + empty URL -> ERROR, no origin fallback Both pre-seed origin/main with a fork SHA so a regression to silent fallback would emit BASE:<forkSha> and the assertion would catch it.

… parsers Closes the remaining silent-wrong-base sub-cases that the prior fail-closed fixes (d87ab1a, effd7d7) didn't cover. After this change, every code path that could downgrade PR_METADATA_PROVIDED=1 to a silent origin/local fallback emits ERROR with explicit guidance instead. resolve-base.sh: - parse_remote_url right-anchors owner/repo (last two path segments, was first two), then rejects paths with 3+ segments. Symmetric with parse_pr_url's posture on path-prefixed PR URLs. Closes silent miscategorization for path-prefixed GHE remotes (acme.com/github/org/repo) and nested-namespace platforms (GitLab subgroups, Gitea orgs, Azure DevOps). - Bracketed-IPv6 scp-form (git@[::1]:owner/repo) explicitly rejected instead of falling through to substring miscategorization. - --pr-base-host without --pr-base-repo now errors, symmetric with the existing --pr-base-repo-without-host check. Previously left PR_METADATA_PROVIDED=0 and fell through to origin fallback. - gh pr view jq parse failures (malformed JSON) now trap via 'if !' and ERROR. The prior '|| true' silently swallowed them. - gh pr view returning a PR URL with empty baseRefName now errors instead of falling through silently. - Header limitations docblock updated to describe the new posture honestly: path-prefixed GHE remotes fail closed (not "supported via explicit flags"), and 3+ segment remote URLs are rejected up front. tests/resolve-base-beta-script.test.ts: - Inverted the path-prefixed parser and end-to-end tests added in the prior pass: they now assert rejection, not right-anchored success. - New parser test: nested-namespace scp-form (git@gitlab.com:group/sub/repo) is rejected. - New end-to-end test: auto-detect mode with no PR metadata (gh stubbed to exit 1) preserves the legacy origin-fallback path via origin/HEAD. Locks in the PR_METADATA_PROVIDED=0 contract that previously had no regression coverage. Verification: bun test on tests/resolve-base-beta-script.test.ts passes 31/31. Full suite: 1420 pass, 14 unrelated pre-existing env failures (trust-check-codex EPERM on ~/.cache, ce-release-notes-helper EADDRINUSE on Bun.serve port:0).

…e source Two independent fixes for findings filed after the silent-fallback audit landed. resolve-base.sh: - Extract the "strip trailing :port" logic into derive_host_without_port helper and call it from both the flag-input path and the auto-detect path. Previously PR_BASE_HOST_WITHOUT_PORT was derived only once from flag inputs; when PR_BASE_HOST was later reassigned from a parsed gh pr view URL, the without-port version was not re-derived. On GHE setups with port-carrying PR URLs (e.g. https://ghe.example.com:8443/org/repo/pull/N) and scp-form remotes (which cannot carry a port), the matcher then emitted a false-positive "does not match any configured git remote" ERROR for a correctly configured remote. integrity-check-config.sh: - Switch from `git check-ignore` to `git check-ignore -v`, parse the source field, and accept only sources inside the repo working tree or .git/info/exclude. Reject matches that come from the user's global core.excludesfile -- those would let a collaborator or CI environment without that global config commit config.local.yaml. The script's "OK" result now actually means "safely ignored at the repository level." tests: - New test in resolve-base-beta-script.test.ts (32 total in file) covers the auto-detect-with-port + scp-form-remote scenario end-to-end. Asserts BASE:<sha> instead of the prior false-ERROR. - New focused test file tests/integrity-check-config-script.test.ts (4 tests) covers all four ignore-source cases: repo .gitignore OK, .git/info/exclude OK, global core.excludesfile rejected, not-ignored rejected. Created as a separate file rather than extending the broader contract test to keep the scope visible. Verification: bun test tests/resolve-base-beta-script.test.ts 32 pass, tests/integrity-check-config-script.test.ts 4 pass. Full suite: 1425 pass, 14 unrelated pre-existing env failures (trust-check-codex EPERM, ce-release-notes-helper EADDRINUSE).

Replace resolve-base.sh's standalone jq parsing of gh pr view metadata with gh's built-in --jq output and pure bash field splitting, preserving the PR-mode fail-closed guards for missing or unparseable metadata. This addresses the P1 Codex review finding on PR EveryInc#784 and matches the portability class fixed by d87ab1a, where beta scripts assumed external tools existed on PATH without guards or fallbacks. Documented the root cause and prevention rule in docs/solutions/integrations/beta-script-external-tool-portability.md.

github-actions Bot and others added 10 commits May 3, 2026 19:58

chore: release main (EveryInc#753)

06a7cee

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

feat(riffrec-feedback-analysis): add Riffrec feedback skill with thre…

dde9256

…e-path routing (EveryInc#747) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: release main (EveryInc#767)

6217031

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

fix(ce-doc-review): tighten finding resolution routing (EveryInc#769)

5427863

fix(code-review-beta): harden Codex delegation contracts

0f54f60

fix(ce-worktree): resolve script path against skill dir, not user CWD (…

4cc1ee6

…EveryInc#772) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(ce-work-beta): update Codex sandbox flags to current CLI syntax (E…

7ff3472

…veryInc#770)

docs(skills): add user-facing skill docs at docs/skills/ (EveryInc#773)

a1698b7

davidalee changed the title ~~feat(ce-code-review-beta): add Codex CLI delegation for mid-tier reviewers~~ fix(code-review-beta): secure Codex review delegation May 5, 2026

tmchow and others added 19 commits May 5, 2026 01:02

fix(ce-doc-review): block diagram deletion as a fix recommendation (E…

1f3c646

…veryInc#775) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(review): escape literal pipes in finding table cells (EveryInc#779)

c7fc674

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(ce-plan): render Implementation Units as headings, not bulleted l…

be2efd7

…ist items (EveryInc#766)

fix(ce-work-beta): replace semicolon pre-resolution with single-comma…

5139ff1

…nd form (EveryInc#758) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(ce-work-beta): adaptive effort selection for Codex delegation ba…

3e03365

…tches (EveryInc#759)

feat: convert hooks to .codex/hooks.json for Codex target (EveryInc#742)

60b66dd

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: release main (EveryInc#771)

18076e0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

fix(doc-review): cut review noise on plans, scope personas to doc sha…

8349e75

…pe (EveryInc#780) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(code-review-beta): keep stable review references unchanged

9b124c0

docs(skills): sync ce-doc-review and ce-plan with PR EveryInc#780 (Ev…

65c278c

…eryInc#782)

chore: release main (EveryInc#781)

52e8f86

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'main' into feat/ce-code-review-beta-codex-delegation

dc8c60b

davidalee added 12 commits May 5, 2026 22:24

fix(code-review-beta): drop broken docs link in README beta row

dc5b8d4

The merge from main converted the ce-code-review-beta skill mention into a markdown link pointing at docs/skills/ce-code-review-beta.md, but no such doc exists. Keep the skill mention; drop the link.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(code-review-beta): secure Codex review delegation#8

fix(code-review-beta): secure Codex review delegation#8
davidalee wants to merge 41 commits into
mainfrom
feat/ce-code-review-beta-codex-delegation

davidalee commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

davidalee commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review Lane Split

What's in the Diff

Behavior

Verification

Suggested Reviewer Focus

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davidalee commented May 4, 2026 •

edited

Loading