Skip to content

Embed external $ref schemas for offline validation#86

Merged
rdimitrov merged 1 commit intomainfrom
fix/embed-external-schema-refs
Apr 13, 2026
Merged

Embed external $ref schemas for offline validation#86
rdimitrov merged 1 commit intomainfrom
fix/embed-external-schema-refs

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Summary

  • Vendor the MCP server.schema.json into registry/types/data/ so all $ref resolution in JSON schema validation stays local
  • Switch validateAgainstSchema from gojsonschema.Validate() to SchemaLoader.AddSchemas() + Compile(), preloading referenced schemas by their $id
  • Add TestExternalRefsHaveEmbeddedSchemas CI guard that fails if any embedded schema has an external $ref without a matching vendored $id

Problem: upstream-registry.schema.json contains $ref URLs pointing to static.modelcontextprotocol.io and raw.githubusercontent.com. The gojsonschema library follows these over HTTP during validation, causing timeout failures in network-restricted deployments (e.g. environments with egress network policies).

Fix: All referenced schemas are now embedded and preloaded into the SchemaLoader, so validation never makes HTTP requests. The CI test ensures this invariant holds as schemas evolve.

Test plan

  • Existing TestUpstreamRegistrySchemaVersionSync passes (schema version alignment)
  • New TestExternalRefsHaveEmbeddedSchemas passes (all external $ref URLs resolve locally)
  • Full task suite passes (lint + all tests)
  • Verify in a network-restricted environment that sync no longer fails with timeout errors

🤖 Generated with Claude Code

The `upstream-registry.schema.json` file references external schemas via
`$ref` URLs (MCP server schema and skill schema). The `gojsonschema`
library follows these over HTTP during validation, which fails in
network-restricted environments with timeout errors.

Vendor the MCP `server.schema.json` into the embedded data directory and
switch `validateAgainstSchema` from the convenience `gojsonschema.Validate`
to `SchemaLoader.AddSchemas` + `Compile`, preloading referenced schemas
by their `$id` so all `$ref` resolution stays local. Add a CI guard test
that scans all embedded schemas for external `$ref` URLs and asserts each
one has a matching `$id` among the vendored schemas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov merged commit 7ac105c into main Apr 13, 2026
5 checks passed
@rdimitrov rdimitrov deleted the fix/embed-external-schema-refs branch April 13, 2026 21:47
rdimitrov added a commit to stacklok/toolhive-registry-server that referenced this pull request Apr 14, 2026
## Summary

- Bump `toolhive-core` from v0.0.15 to v0.0.16
- Picks up stacklok/toolhive-core#86 which embeds all
externally-referenced JSON schemas so `$ref` resolution during
validation never makes HTTP requests

This fixes the sync timeout reported by Peder in network-restricted
environments where egress to `raw.githubusercontent.com` and
`static.modelcontextprotocol.io` is blocked by network policy.

## Test plan

- [x] `task all` passes (lint, tests, build)
- [ ] Deploy to a network-restricted environment and verify sync
completes without schema fetch timeouts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants