Skip to content

fix(retro): language-agnostic test detection + anti-fabrication guard#2037

Open
SholtoMc wants to merge 1 commit into
garrytan:mainfrom
SholtoMc:fix/retro-test-detection
Open

fix(retro): language-agnostic test detection + anti-fabrication guard#2037
SholtoMc wants to merge 1 commit into
garrytan:mainfrom
SholtoMc:fix/retro-test-detection

Conversation

@SholtoMc

Copy link
Copy Markdown

Problem

While verifying a generated weekly retro against git history, two headline metrics turned out to be fabricated:

  • "Bootstrap commit #2203 — 5,165 files · 1.4M lines · 1,455 test files." The cited commit was actually a 2-file, 187-line docs PR. No bootstrap had occurred (the repo was 8+ months / 1,200+ commits old).
  • "0 new tests added." In reality ~28 test files were added that week.

Root cause

The /retro Step-1 data collection counts tests with JS/TS-only patterns:

# "Test files changed in window"
git log ... --name-only | grep -E '\.(test|spec)\.' | sort -u | wc -l

\.(test|spec)\. only matches foo.test.ts / foo.spec.js. It returns 0 for:

  • Python test_*.py / *_test.py
  • Terraform *.tftest.hcl
  • Bats *.bats
  • path-based tests/…

The total-test find (command 10) likewise missed the test_*.py prefix convention. So the generator saw "N total tests exist" but "0 added this period," and reconciled that contradiction by inventing a non-existent "bootstrap commit" with made-up file/LOC figures — there was no guard forcing per-commit numbers to come from the commit itself.

Fix

  • Commands 10 / 12 / 13 now use a language-agnostic test pattern (Python / JS-TS / Ruby / Terraform / Bats / path-based), exclude vendored dirs, and split tests ADDED (--diff-filter=A) from TOUCHED. Command 10 uses git ls-files (respects .gitignore).
  • New Step 1.5 "plausibility guard": per-commit figures must come from git show --shortstat <hash>; never invent a "bootstrap/foundation/initial-import" narrative; reconcile count contradictions by re-checking command 12, not by narrating an unverified event.
  • Applied to both retro/SKILL.md.tmpl (source) and retro/SKILL.md (runtime).
  • test/regression-retro-test-detection.test.ts locks in both invariants (21 tests).

Verification

On the affected repo's window the patched commands return 28 tests added / 47 touched (was 0). bun test test/regression-retro-test-detection.test.ts → 21/21 pass; existing regression-1624-retro-stale-base.test.ts → 13/13 pass.

🤖 Generated with Claude Code

The retro generator's test-counting commands only matched JS/TS files
(grep '\.(test|spec)\.'), returning 0 for Python (test_*.py), Terraform
(*.tftest.hcl), and Bats (*.bats) suites. Combined with a repo-wide total
count, the contradiction ('N total tests, 0 added') led the model to
fabricate a non-existent 'bootstrap commit' to reconcile the numbers.

- Commands 10/12/13: language-agnostic TEST pattern; split tests ADDED
  (--diff-filter=A) from TOUCHED; command 10 uses git ls-files.
- New Step 1.5 plausibility guard: per-commit figures must come from
  'git show --shortstat <hash>'; no invented bootstrap/foundation
  narrative; reconcile count contradictions by re-checking, not narrating.
- Applied to both SKILL.md.tmpl (source) and SKILL.md (runtime).
- regression-retro-test-detection.test.ts locks in both invariants.

Note: SKILL.md line 1 'name: gstack-retro' is pre-existing install-time
prefix drift (skill_prefix:true), not part of this fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@trunk-io

trunk-io Bot commented Jun 18, 2026

Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@jbetala7

Copy link
Copy Markdown
Contributor

Overlap heads-up: this shares its command-10 change with the older, issue-linked #2013 (Closes #1999). Both rewrite the same glob at retro/SKILL.md:1000 and the mirrored line in retro/SKILL.md.tmpl — JS/TS-only find … | grep -v node_modules → polyglot pattern + vendored-dir exclusion — so whichever merges second will conflict on those exact lines.

The deltas here are real and worth keeping: this PR additionally fixes commands 12/13 (retro/SKILL.md:1009, still grep -E '\.(test|spec)\.' on main, which #2013 doesn't touch) and adds the Step 1.5 plausibility guard + regression test — the headline anti-fabrication value. #2013 is narrower but carries the issue link.

Suggestion: since this is a superset, consider adding Closes #1999 and crediting #2013, so the maintainer can land this one and close the narrow PR without dropping the issue. If #2013 lands first instead, this just needs a rebase to drop the duplicate command-10 / vendored-exclusion hunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants