Server-side context graph: GitHub capture + T0 projection + neighbors#6
Server-side context graph: GitHub capture + T0 projection + neighbors#6philcunliffe wants to merge 3 commits into
Conversation
The end-to-end smoke test was failing — partly flaky, partly broken —
from three independent causes. It exits on the first failed check, so
each masked the next; this fixes all three.
1. Mover "run now" could silently no-op (the real ~50% flake).
mover.tick() had a `running` guard that made a *concurrent* call
return 0 immediately. The 200ms background timer and the admin
/v1/admin/mover/run endpoint both call it, so when runMover() landed
mid-pass it returned 200 without committing the just-spooled row, and
the follow-up query saw 0 rows. Split into opportunistic tick() (the
timer keeps skipping, never piles up) and guaranteed drain() (waits
out any in-flight pass, then runs a fresh pass whose pending()
snapshot is guaranteed to include rows spooled just before the call).
The admin endpoint and the shutdown drain now use drain().
2. Config pin format diverged from the kernel's config wire schema.
The save pipeline emitted a lock-file-shaped plugin entry into a
config document — object `source` ({kind,raw,path}) and `content_hash`
— but the kernel's parseConfigShape requires `source` to be a string
and the client verifies the pin under `artifact_hash` (hypaware
config/apply_deps.js). The server thus produced a document its own
shape parser rejects on re-submission. Emit `version` +
`artifact_hash`, keep `source` as the operator's raw string the client
re-resolves; validatePrePinned validates that shape.
3. Smoke proxy row predated the ai_gateway_messages schema. That dataset
gained a required non-null `session_id` column (schema v6); add it.
Verified: 50/50 runs green (was ~50% flaky).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ghbors) Load @hypaware/context-graph and a vendored @hypaware/github into the server's own kernel so the server can capture github_events directly, project the T0 graph, and answer neighbors queries over its own cache — where the forwarded 1.6 LLM logs also live, enabling GitHub<->LLM convergence (hypaware LLP 0032). - boot: activate context-graph + github (poll source dormant); inject the [github] section from HYPSERVER_GITHUB_* env, token stays in box env - daemon: services.githubBackfill/graphProject/graphNeighbors reuse the plugins' pure functions over the kernel query+storage; flush github_events after capture so it is immediately queryable - routes-admin + admin CLI: github-backfill, graph-project, graph-neighbors - registry: self-managed graph datasets keep their own read closures (source=/graph_v1 layout), not the date= synthesis used for wire ingest - shim: re-export projectGraph/queryNeighbors/requireGraphRuntime anchored on bundledWorkspaceDir for module-singleton identity - smoke: full hermetic backfill->project->neighbors chain (no network) - LLP 0010 documents the decision; plugins/github vendored (own ref corpus) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Maps forwarded ai_gateway_messages into the same node/edge graph; its bridge-ready Repo/Commit/File keys converge by content-addressed id with github's, so one graph-project spans both sources. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dual-agent review —
|
| Source | Finding (severity, evidence) | Intersects |
|---|---|---|
| Claude (comments/tests) | Incremental poll / cursor path untested — minor, conf 90 (smoke.js:560-605; capture.js:341-343,370; cursors.js:9-13) | Risks: incremental-capture surface; Concurrency: github capture tick |
| Claude (comments/tests) | graph/neighbors filters untested — minor, conf 88 (smoke.js:597; routes-admin.js:2990-3012) | Risks: incremental-capture surface; Direct callers: daemon→queryNeighbors |
| Claude (comments/tests) | bin/admin.js graph CLI wrappers untested — minor, conf 92 (bin/admin.js:21-33) | Risks: incremental-capture surface; Direct callers: admin CLI↔routes |
| Claude (comments/tests) | Multi-repo/org/ignore + per-repo error isolation untested — minor, conf 90 (capture.js resolveRepos/captureRepos) | Risks: incremental-capture surface; Concurrency: capture errors[] |
| Claude (comments/tests) | graph/project --source + dry_run untested — minor, conf 85 (daemon.js:166-167) | Risks: incremental-capture surface; Cross-package: server→context-graph |
| Claude (bug-scan, sub-threshold) | Poll-mode 304 comment misclassification — minor, conf 68 (capture.js:458-465,430-432) | Risks: incremental-capture surface |
| Claude (contracts, sub-threshold) | Stale content_hash log-field label — nit, conf 70 (save-pipeline.js:220) |
Direct callers: save-pipeline pin fields |
| Claude (guidance, sub-threshold) | Vendored plugins/github/ Code-Style + broken @refs — minor, conf 50-55 |
Cross-package: plugin manifest→loader (vendored carve-out, LLP 0010) |
| Codex | (codex unavailable — gateway-proxy stream disconnect, 2 attempts) | — |
- Codex review: (unavailable — see note above)
Claude review
Claude review
Five parallel review subagents covered guidance-compliance, a shallow bug scan,
git-history regression analysis, contract/caller consistency, and comments+tests.
Three of them independently ran test/smoke.js end-to-end — all 92 checks pass,
including the new github-backfill → graph-project → graph-neighbors chain (checks
77–88). No logic bug, contract mismatch, or history regression survived scrutiny.
Every surviving finding below is a test-coverage gap on the new GitHub-capture
surface; all are scored ≥80 confidence but are minor in severity because no defect
was demonstrated on any exercised path and the most exposed path (incremental
polling) is dormant in the current server config (poll_interval is never set).
Findings the subagents raised that did not clear the ≥80 bar (recorded for the
risk cross-reference, not as blockers): vendored plugins/github/ Code-Style
deltas — inline import('...') types, a @typedef, and @refs that resolve
against a different corpus (conf 50–55, explicitly carved out by LLP 0010 as a
vendored tree); a poll-mode comment-misclassification latent bug on a 304 pulls
listing (conf 68, dormant path); and a stale content_hash log-field label in
save-pipeline.js:220 (nit, conf 70, log key only — not the wire pin).
Incremental poll / cursor-advancement path has no test coverage
- Severity: minor
- Confidence: 90
- Evidence: test/smoke.js:560-605 (backfill-only); plugins/github/src/capture.js:341-343,370 and plugins/github/src/cursors.js:9-13 (untested high-water/resume logic)
- Why it matters: smoke only ever calls
mode:'backfill', which resets the cursor to{}; thesincehigh-water,advancePullsHigh/changedSince, and the cursor sidecar read/write that drive incremental capture are never asserted — a regression in cursor math (and the related conf-68 304 comment-misclassification bug) would ship green. The path is dormant today, so this is forward-looking risk, not a current defect. - Suggested fix: add a second
poll-mode tick after backfill with a fixture whoseupdated_atadvances, asserting only-new rows are appended andgithub-cursors.jsoncarries the high-water; this also exercises the 304/comment-type path.
graph/neighbors filter parameters untested
- Severity: minor
- Confidence: 88
- Evidence: test/smoke.js:597 (single
{type:'Repo',depth:2,direction:'both'}call) vs src/http/routes-admin.js:2990-3012 (parsesdirectionin/out/both,edge_types[],limit) - Why it matters: the parameters most prone to a wrong-direction or off-by-one bug —
direction:'in'/'out',edge_typesfiltering,limittruncation — are never asserted. - Suggested fix: add neighbors calls with
direction:'out'and anedge_typesfilter, asserting the reachable set differs as expected.
bin/admin.js graph command wrappers untested
- Severity: minor
- Confidence: 92
- Evidence: bin/admin.js:21-33 (github-backfill / graph-project / graph-neighbors flag→body mapping); smoke calls the HTTP routes directly and never invokes the CLI
- Why it matters: the arg-parsing glue (
--edge-type→edge_types:[x],Number()on--depth/--limit, leading---→ no-node) is thin but entirely uncovered. - Suggested fix: a small unit test of the flag→body mapping, or accept as a documented known gap.
Multi-repo / org-enumeration / ignore-list capture untested
- Severity: minor
- Confidence: 90
- Evidence: plugins/github/src/capture.js
resolveRepos(repos ∪ org minus ignore, lowercased/deduped/sorted) andcaptureReposper-repo error isolation; smoke uses one explicit repo, no orgs, no ignore - Why it matters: fleet selection and the guarantee that one failing repo doesn't abort the tick (
errors[]) are load-bearing for real deployments and unexercised. - Suggested fix: a fixture with an org + an ignored repo plus an injected per-repo throw, asserting the resolved set and that the throw lands in
errors[]without aborting.
graph/project --source filter and dry_run untested
- Severity: minor
- Confidence: 85
- Evidence: src/daemon.js:166-167 (
sourceDatasetfilter whensourcegiven) and thedry_runpath in routes-admin.js; smoke always postsjson:{} - Why it matters: source-scoping is what makes the github-only vs cross-source convergence distinction real, and
dry_run(writes nothing) is the safe-preview contract — neither is asserted. - Suggested fix: one
graph/projectwith{source:'github_events'}and one{dry_run:true}assertingnodesWritten === 0.
Reports: .git/dual-review/pr-6
🧭 Decision map — where to spend your attentionCompanion to the dual-review verdict. This casts no verdict — it points at the 6 forks where the author made a real choice, so you can skim the rest. Scanned: 40 hunks across 27 files (+2968 −48). Most is mechanical: ~14 new 1. Served config-pin wire shape →
|
What & why
The server could collect forwarded logs but couldn't host a graph: its kernel plugin
set is hardcoded and the admin attach is SQL-only. This adds a server-side context
graph — the server captures GitHub activity directly, projects both GitHub events and
forwarded LLM sessions into one
node/edgegraph, and answers graph queries. Thatenables the GitHub↔LLM convergence 1.6's git-bridge was built for (sessions and GitHub
activity share content-addressed
Repo/Commit/Filenodes).Design rationale lives in
llp/0010-server-side-graph.decision.md— not duplicated here.Changes
@hypaware/githubunderplugins/github/(no git remote yet) and load it +bundled
@hypaware/context-graph+@hypaware/ai-gateway-graphinto the server kernel(
src/boot.js). The github poll source stays dormant; capture is an admin one-shot.github-backfill,graph-project,graph-neighbors(
src/http/routes-admin.js,bin/admin.js) — reuse the plugins' pure functions over theserver kernel's query/storage handles (
src/daemon.js, re-exported viasrc/kernel/shim.js).[github]config fromHYPSERVER_GITHUB_*env; the GitHub token stays in the box env,never in config (
src/config.js,src/types.d.ts).github_events/node/edge) keep their ownread closures instead of the
date=partition synthesis used for wire ingest(
src/catalog/registry.js).github_eventsafter backfill so captured rows are immediately queryable (src/daemon.js).@refannotations; smoke extended to drive the fullbackfill → project → neighbors chain hermetically (in-memory GitHub client, no network).
Two non-obvious bugs the new smoke test caught
appendRowsbuffers in the cache writer — needed an explicit flush or rows aren't queryable.date=synthesis(which only fits forwarded ingest) → projection/neighbors silently read zero rows.
Testing
npm run smoke: 92 checks pass locally and inside the built linux image.across 12 repos → projected 13,607 nodes / 45,001 edges;
graph-neighborsverified;cross-source convergence (Session ↔ Repo/Commit/File) confirmed once LLM logs forwarded.
Notes / follow-ups (non-blocking)
plugins/github/is vendored third-party code; its@refs resolve against the@hypaware/githubcorpus, so exclude it from this repo's/ref-check(noted in LLP 0010).bin/admin.js, undici) times out the client at ~300s on long ops like anorg-wide backfill; the server handler completes anyway. Worth a longer
headersTimeoutor anasync backfill + status endpoint.
github_eventstoo and the serveraccepted them. Decide whether that's desired (the server pulls its own) or should be rejected.
🤖 Generated with Claude Code