fix(retro): language-agnostic test detection + anti-fabrication guard#2037
fix(retro): language-agnostic test detection + anti-fabrication guard#2037SholtoMc wants to merge 1 commit into
Conversation
The retro generator's test-counting commands only matched JS/TS files
(grep '\.(test|spec)\.'), returning 0 for Python (test_*.py), Terraform
(*.tftest.hcl), and Bats (*.bats) suites. Combined with a repo-wide total
count, the contradiction ('N total tests, 0 added') led the model to
fabricate a non-existent 'bootstrap commit' to reconcile the numbers.
- Commands 10/12/13: language-agnostic TEST pattern; split tests ADDED
(--diff-filter=A) from TOUCHED; command 10 uses git ls-files.
- New Step 1.5 plausibility guard: per-commit figures must come from
'git show --shortstat <hash>'; no invented bootstrap/foundation
narrative; reconcile count contradictions by re-checking, not narrating.
- Applied to both SKILL.md.tmpl (source) and SKILL.md (runtime).
- regression-retro-test-detection.test.ts locks in both invariants.
Note: SKILL.md line 1 'name: gstack-retro' is pre-existing install-time
prefix drift (skill_prefix:true), not part of this fix.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Overlap heads-up: this shares its command-10 change with the older, issue-linked #2013 (Closes #1999). Both rewrite the same glob at The deltas here are real and worth keeping: this PR additionally fixes commands 12/13 ( Suggestion: since this is a superset, consider adding |
Problem
While verifying a generated weekly retro against git history, two headline metrics turned out to be fabricated:
Root cause
The
/retroStep-1 data collection counts tests with JS/TS-only patterns:\.(test|spec)\.only matchesfoo.test.ts/foo.spec.js. It returns 0 for:test_*.py/*_test.py*.tftest.hcl*.batstests/…The total-test
find(command 10) likewise missed thetest_*.pyprefix convention. So the generator saw "N total tests exist" but "0 added this period," and reconciled that contradiction by inventing a non-existent "bootstrap commit" with made-up file/LOC figures — there was no guard forcing per-commit numbers to come from the commit itself.Fix
--diff-filter=A) from TOUCHED. Command 10 usesgit ls-files(respects.gitignore).git show --shortstat <hash>; never invent a "bootstrap/foundation/initial-import" narrative; reconcile count contradictions by re-checking command 12, not by narrating an unverified event.retro/SKILL.md.tmpl(source) andretro/SKILL.md(runtime).test/regression-retro-test-detection.test.tslocks in both invariants (21 tests).Verification
On the affected repo's window the patched commands return 28 tests added / 47 touched (was
0).bun test test/regression-retro-test-detection.test.ts→ 21/21 pass; existingregression-1624-retro-stale-base.test.ts→ 13/13 pass.🤖 Generated with Claude Code