fix(importer): cleanup trio — http urls, tag defaults, gitsheets staleness docs#74
Merged
Merged
Conversation
Bundles #47, #56, #58 — three small fixes on the importer/schema surface that all surfaced from the legacy-import dry run. Plan body covers each sub-deliverable's approach, expected count deltas, and the risks for both the http:// fidelity and topic-default decisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The legacy importer was dropping 81 of 113 ProjectBuzz records because
their URLs are http:// — mid-2010s press links that codeforphilly.org
itself still serves as plain HTTP. Fidelity wins over the marginal
security value of refusing them; future moderation tooling will need to
flag bad-actor URLs irrespective of scheme.
Schema drops `.startsWith('https://')`; spec row updated to call out the
legacy-import policy. Schemas test now asserts http:// passes (and a
malformed URL still fails so the .url() floor is intact).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ough (closes #58) Two importer adjustments paired with the spec changes: splitTagHandle no longer returns null — bare-word laddr tags (~120 of them: org names, single-event keywords) now default to namespace=topic with an audit warning. Operators can re-namespace later via tooling. The Tag spec already documents this policy (data-model.md, prior commit-adjacent edit landed alongside the http:// schema relaxation spec change). For ProjectBuzz urls, swap validHttps() for a sibling validUrl() helper that accepts http: or https: — validHttps stays in place for Project's usersUrl / developersUrl which still require HTTPS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…47) queryAll() on slug-history returned [] after a transact even though git ls-tree showed the file was written. Investigation: every gitsheets Sheet caches the dataTree snapshot it was opened against and never refreshes — the transact path itself is fine because repo.transact builds a fresh workspace from HEAD per call, and route handlers read from the typed in-memory Store (mutated in lockstep by StateApply), so production exposure today is zero. The bite is on direct sheet.query*() reads after a write — currently only an issue in tests that need to verify writes to sheets we don't load into the typed Store (slug-history, revocations). New storage.md section explains the limitation and the in-memory-state fix path; swapPublic JSDoc points at it; failing test's git-show fallback now has a comment that links the discussion instead of just describing the symptom. No runtime change — when a future redirect handler needs slug-history post-write, the right move is to load it into the typed Store like the other sheets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closeout: tick the validation criteria, fill Notes (incl. the data-model.md spec-edit straddling two commits — minor but worth flagging) and Follow-ups (upstream gitsheets enhancement, deferred live re-run, future tag re-namespacing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
46141f4 to
b1bb223
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three small importer/schema fixes bundled because they're each tiny on their own and share the spec-→-importer surface. Counts validated by a
--dry-runimporter pass against the live laddr snapshot.ProjectBuzz.urlschema to allow any valid URL. The legacy importer was dropping 81 of 113 buzz records onhttp://press links that codeforphilly.org itself still serves as plain HTTP. Fidelity wins over the marginal security value of refusing them. 32 → 112 imported (1 still legitimately skipped due to unresolved FK, not URL).topic. ~120 laddr tags (autocomplete-create artifacts with bare-word handles) were being skipped; they now import with an audit warning. 885 → 1017 imported, matches the "~120" estimate.Sheet#dataTreecaching limitation that bit the account-claim test. Investigation found no live production exposure (route handlers read from the typed in-memory Store, kept in lockstep byStateApply; only directsheet.query*()reads after a write are stale). New section inspecs/behaviors/storage.md; tightened JSDoc onStore.swapPublic; failing-test fallback comment now links the discussion.The live
legacy-importre-run intentionally did not run from this branch — the relaxed schema must ship to the sandbox pod first, otherwise a subsequentpublishedmerge would fail validation at boot/reload. Dry-run proved the importer-side fix; the actual write happens at the next deploy cadence.Stacked on PR #73 — once #73 merges, this branch rebases cleanly.
Test plan
npm run type-check && npm run lintclean (pre-confirmed locally)npm run -w packages/shared testandnpm run -w apps/api testpassnpm run -w apps/api script:import-laddr -- --branch=legacy-importagainst the live laddr snapshot, push, merge topublished, watch the hot-reload short-circuit log/tags/topicfor any obvious mis-classifications🤖 Generated with Claude Code