feat(e2e): diff-aware AI review + predictive reference checking by AhmedTMM · Pull Request #2876 · OpenRouterLabs/spawn

AhmedTMM · 2026-03-22T19:29:51Z

Summary

Diff-aware AI review — includes git diff since last successful E2E run (tracked via e2e-last-green tag) in the AI prompt, enabling causal analysis like "this 404 likely caused by commit abc123 which deleted file Y"
Predictive reference check — new CI workflow on PRs that cross-references CDN URLs, GitHub raw URLs, Packer script refs, and manifest entries against actual repo files. Fails the check and comments on the PR with specific broken references.

What it catches at PR time

Would have blocked the sprite-keep-running.sh 404 from ever merging — the CDN URL in sprite.ts referenced sh/shared/sprite-keep-running.sh which never existed.

Also found

The e2e-interactive QA trigger has never actually run — the trigger server on the Sprite VM returns 400 invalid reason because it's running an old version without e2e-interactive in VALID_REASONS. The code is correct in the repo, the service just needs a restart.

Test plan

Open a PR that deletes a file referenced by a CDN URL → reference-check should fail and comment
Run E2E, verify AI review includes diff context in its analysis
Verify mark_e2e_green updates the tag after a fully passing run

🤖 Generated with Claude Code

Diff-aware AI review: - Tracks last all-green E2E run via e2e-last-green git tag - Feeds git diff (changed files + key hunks) alongside logs to the AI - Enables causal analysis: "this 404 likely caused by commit X" - Tag only advances when all tests pass Predictive reference check (new CI workflow): - Runs on PRs touching sh/, packages/cli/, packer/, manifest.json - Cross-references CDN URLs (openrouter.ai/labs/spawn/...) against sh/ - Also checks: raw GitHub URLs, Packer script refs, manifest entries - Comments on PR with broken references and fails the check - Would have prevented the sprite-keep-running.sh 404 from merging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

louisgv

Security Review

Verdict: APPROVED
Commit: 1fd7002

Findings

No security issues found. All changes are safe.

Detailed Analysis

.github/workflows/reference-check.yml:

✓ Command injection: All variable interpolation uses printf '%s' with proper quoting
✓ Path traversal: Fixed prefixes only (sh/, packer/)
✓ Credentials: GITHUB_TOKEN properly scoped to PR comments
✓ GitHub Actions: Minimal permissions (contents:read, pull-requests:write)

sh/e2e/lib/ai-review.sh:

✓ Command injection: Uses env vars with bun (not shell interpolation), heredoc with single quotes
✓ Credential leaks: OPENROUTER_API_KEY properly used in Authorization header
✓ Path traversal: mktemp usage is safe
✓ Git operations: Local tag only (no push), no user-controlled commands
✓ macOS bash 3.x: Uses set -eo pipefail, ${VAR:-} for optional vars, printf instead of echo -e

sh/e2e/e2e.sh:

✓ Function call: mark_e2e_green is properly sourced from ai-review.sh (line 35)

Tests

bash -n: PASS (all 3 files)
bun test: N/A (pre-existing test infrastructure issues unrelated to this PR)
curl|bash: OK (scripts are sourced/called, not executed remotely)
macOS compat: OK (proper use of printf, ${VAR:-}, no bash 4+ features)

Summary

This PR adds diff-aware AI review and predictive reference checking to E2E tests. The implementation follows all security best practices:

No command injection vectors
No credential leakage
Proper error handling
macOS bash 3.x compatible
Minimal GitHub Actions permissions

-- security/pr-reviewer

louisgv

Security Review

Verdict: CHANGES REQUESTED
Commit: 1fd7002

Findings

HIGH .github/workflows/reference-check.yml:36 — Regex pattern matches template variables as URLs, causing false positives. The pattern openrouter\.ai/labs/spawn/$[^'\"[:space:]]*$ matches strings like ${cloud}/${agent}.sh in code, which are variable interpolations, not actual URLs.
HIGH .github/workflows/reference-check.yml — The check fails on its own codebase, reporting 6 "broken" references that are actually valid template variables or variables in comments.

Root Cause

The grep pattern is too broad:

grep -rn 'openrouter\.ai/labs/spawn/'

This matches:

Actual URLs: https://openrouter.ai/labs/spawn/sprite/claude.sh ✓
Template strings: openrouter.ai/labs/spawn/${cloud}/${agent}.sh ✗ (false positive)
Comments explaining URL structure: # Pattern: openrouter.ai/labs/spawn/{path} ✗ (false positive)

Required Fix

Add negative lookbehind/lookahead to exclude template variables, OR:

Skip lines containing ${ or \${
Filter out comment-only lines before checking file existence
Add explicit URL protocol requirement (https://)

Other Issues Found by the Check

MEDIUM packer/digitalocean.pkr.hcl — References packer/scripts/tier-${var.cloud_init_tier}.sh which doesn't exist (this is a real broken reference that should be fixed separately)

Tests

bash -n: PASS (all 3 files)
bun test: N/A (pre-existing test infrastructure issues)
curl|bash: OK
macOS compat: OK
CI check: FAIL (workflow reports false positives on own code)

Recommendation

Fix the regex pattern to avoid false positives, then re-test the workflow against the codebase. The security issues I found earlier (command injection, credentials, etc.) are still valid - no security vulnerabilities, just functional bugs.

-- security/pr-reviewer

la14-1 · 2026-03-22T21:07:09Z

The changes requested on this PR are in .github/workflows/reference-check.yml (fixing the regex pattern to exclude template variables like \${cloud}/\${agent}.sh from being treated as URL paths).

Workflow files are off-limits for the automated refactoring service — this requires a human to address. The fix is straightforward: add | grep -v '\${' to filter out template variable lines before checking file existence, or require the https:// protocol prefix in the URL pattern.

-- refactor/team-lead

louisgv

Security Review

Verdict: APPROVED
Commit: 1fd7002

Findings

No security issues found. This PR adds:

A GitHub Actions workflow for predictive reference checking (verifies CDN URLs, GitHub raw URLs, Packer scripts, and manifest entries)
Diff-aware AI review with git-based causal analysis (tracks changes since last successful E2E run)

Security Analysis

Command injection: None. All user-controlled data is properly sanitized or used in safe contexts
Credential handling: OPENROUTER_API_KEY properly scoped and never logged
Path traversal: File checks are validated against repo structure
Git operations: All git commands use controlled inputs (no user injection vectors)
Temp files: Properly created with mktemp and cleaned up
API calls: curl with timeouts, proper error handling, no sensitive data in URLs

Tests

bash -n: PASS (all shell scripts)
bun test: PASS (1866/1866 tests pass)
macOS compat: OK (uses printf, proper bash 3.x patterns, no prohibited constructs)

Code Quality

Excellent heredoc hygiene (single quotes prevent expansion)
Good macOS bash 3.x compatibility
Proper use of bun for JSON handling (avoids shell injection risks)
Git tag management for diff baseline is safe and intentional

-- security/pr-reviewer

AhmedTMM · 2026-03-23T03:54:51Z

Closing — the reference checker produces false positives on template variables (${cloud}/${agent}.sh, ${var.cloud_init_tier}, etc.). The diff-aware AI review in e2e.sh and ai-review.sh can be re-extracted in a future PR without the broken checker.

louisgv approved these changes Mar 22, 2026

View reviewed changes

louisgv requested changes Mar 22, 2026

View reviewed changes

louisgv approved these changes Mar 23, 2026

View reviewed changes

AhmedTMM closed this Mar 23, 2026

AhmedTMM mentioned this pull request Mar 23, 2026

feat(e2e): diff-aware AI review with e2e-last-green tracking #2893

Merged

3 tasks

AhmedTMM deleted the feat/diff-aware-ref-check branch April 7, 2026 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(e2e): diff-aware AI review + predictive reference checking#2876

feat(e2e): diff-aware AI review + predictive reference checking#2876
AhmedTMM wants to merge 1 commit into
OpenRouterLabs:mainfrom
AhmedTMM:feat/diff-aware-ref-check

AhmedTMM commented Mar 22, 2026

Uh oh!

louisgv left a comment

Uh oh!

louisgv left a comment

Uh oh!

la14-1 commented Mar 22, 2026

Uh oh!

louisgv left a comment

Uh oh!

AhmedTMM commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AhmedTMM commented Mar 22, 2026

Summary

What it catches at PR time

Also found

Test plan

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Security Review

Findings

Detailed Analysis

Tests

Summary

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Security Review

Findings

Root Cause

Required Fix

Other Issues Found by the Check

Tests

Recommendation

Uh oh!

la14-1 commented Mar 22, 2026

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Security Review

Findings

Security Analysis

Tests

Code Quality

Uh oh!

AhmedTMM commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants