A caching + verification layer for spec-driven loops #3148

davet47 · 2026-06-24T09:48:39Z

davet47
Jun 24, 2026

Spec-driven tooling made specs the source of truth and code regenerable, but most of it runs on plain files, which leaves three costs on the table:

Context acquisition is expensive. Re-reading whole spec and source files to regenerate one unit.
Verification is uncached. Re-running the full relevant test surface even when nothing in a contract's dependency closure changed.
Blast radius is by convention, not mechanism. Nothing tells the agent precisely which dependents a spec change invalidates.
Heddle treats each software unit as a content-addressed contract with explicit dependencies, and addresses those three directly: a hash-keyed verification cache, a get_dependents blast-radius query, and a small context packet per unit. It's complementary to a spec workflow rather than a replacement, so you point any agent at it over MCP. Early benchmark is ~5.5x token reduction on regeneration tasks.

Would love feedback on the contract format and the hashing semantics, in particular what should count as a meaning change (re-verify) versus a cosmetic one (cache holds). Today the contract hash covers the signature and the invariant and example order; whitespace, key order, comments, docstrings, and file relocation are excluded. Invariants are free text inside the hash, so rewording one still re-verifies dependents, which we want to sharpen.

mnriem · 2026-06-24T17:46:04Z

mnriem
Jun 24, 2026
Maintainer

Really like this framing, and I think the three costs you name are real. Coming at it from the Spec Kit side, I'd offer two things: where it overlaps existing work, and a packaging idea that might let you test the whole hypothesis cheaply.

On overlap — context packets and blast radius are fairly crowded already. spec-reference-loader, memory-loader, repoindex, and token-budget's per-phase scoped reading cover a lot of the context-scoping ground, and ripple, architect-impact-previewer, architecture-guard, what-if-analysis, and api-evolve all live in the impact/dependents space. Your angles (per-unit packets, explicit edges over inference) are finer-grained, but they're improvements on trodden ground.

The hash-keyed verification cache is the part I haven't seen anywhere — verify, verify-tasks, ci-guard, trace all just re-run. Caching verification against a content hash, and the meaning-vs-cosmetic question that falls out of it, is the genuinely open problem here. I'd make that the headline.

One structural caution: a separate content-addressed contract artifact is effectively a second source of truth next to the spec, which is the exact drift Spec Kit exists to remove. It gets much stronger if the contracts and their hashes are derived from the spec/plan/code rather than authored in parallel.

Now the packaging idea — I don't think you need to build the MCP server to validate this. Most of it fits a preset:

a script override that content-hashes each unit (signature + invariants + examples),
a cache file under .specify/ for cross-run memoization,
plan/tasks/implement command overrides that call the script, skip re-verification on a hash hit, and only re-verify the declared dependency closure on a miss,
a get_dependents script reading a declared-deps block in the spec template.

That's the full loop — hash, cache, blast radius — with zero external service, everything stacking natively and deriving from the spec. What the MCP layer adds on top is enforced invocation (a preset's cache check is agent-mediated, so a shortcutting agent can skip it), plus concurrency and cross-agent sharing. Real benefits, but they're hardening, not the unlock. So you could ship the preset first and find out whether the cache actually pays off before investing in the service.

On your actual question — meaning vs. cosmetic — a test that resolves most edge cases: a change is meaningful iff it can flip a dependent's verification result. Two consequences: (1) stop hashing free-text invariants — reword = false re-verify = you erode the cache that's your best idea; normalize to a canonical predicate form first. (2) Split the hash into facets (signature / invariant / example) so a dependent that only relies on the signature isn't re-verified when an invariant is reworded, and so get_dependents can tell you which facet changed. I'd also hash examples as a set, not by order, unless order is genuinely semantic.

Would love to see where you take it — the cache + meaning-hash piece especially.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A caching + verification layer for spec-driven loops #3148

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A caching + verification layer for spec-driven loops #3148

Uh oh!

davet47 Jun 24, 2026

Replies: 1 comment

Uh oh!

mnriem Jun 24, 2026 Maintainer

davet47
Jun 24, 2026

mnriem
Jun 24, 2026
Maintainer