diff --git a/.claude/skills/new-sep/SKILL.md b/.claude/skills/new-sep/SKILL.md new file mode 100644 index 00000000..fa19a2e1 --- /dev/null +++ b/.claude/skills/new-sep/SKILL.md @@ -0,0 +1,201 @@ +--- +name: new-sep +description: >- + Scaffold a sep-NNNN.yaml requirement-traceability file for the MCP + conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then + parses the modelcontextprotocol/modelcontextprotocol spec diff to populate + `requirements[]` with the RFC 2119 sentences and proposed check IDs. +argument-hint: '' +--- + +# new-sep: SEP traceability YAML scaffolding + +You are bootstrapping a `sep-NNNN.yaml` file for a new SEP in the MCP conformance repo. The output is the requirement-traceability file specified by SEP-2484: a YAML that maps each normative sentence from the SEP's spec diff to a `check:` ID (testable) or an `excluded:` reason (not testable). The CLI gets the skeleton; you fill in the rows by reading the spec diff. + +## Step 0: Pre-flight checks + +Before doing anything else, verify GitHub CLI authentication: + +```bash +gh auth status 2>&1 +``` + +If this fails, stop immediately and tell the user: + +> GitHub authentication is required for this skill. Please run `gh auth login` first, then re-run. + +Verify you're running inside the conformance repo: + +```bash +test -f package.json && jq -r '.name' package.json +``` + +The name should be `@modelcontextprotocol/conformance`. If not, stop and ask the user to `cd` into the conformance repo first. + +## Step 1: Parse arguments + +Extract from the user's input: + +- **sep-number** (required): the SEP number, e.g. `2164`. This is also the PR number in `modelcontextprotocol/modelcontextprotocol` by convention. + +## Step 2: Generate the skeleton + +Run the CLI: + +```bash +npm run --silent build +node dist/index.js new-sep +``` + +(For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.) + +The CLI writes `src/seps/sep-.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`. + +If the CLI errors with "does not change any docs/specification/draft/\*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with `--spec-path docs/specification/draft/`. Do not guess. + +## Step 3: Fetch the spec diff + +`AGENTS.md` (lines 64–72) is explicit that severity must come from the spec text itself, not the SEP markdown or the conformance PR description: + +```bash +gh api "repos/modelcontextprotocol/modelcontextprotocol/pulls//files" \ + --jq '.[] | select(.filename | test("^docs/specification/draft/.*\\.mdx$")) | {filename, patch}' +``` + +For each file, pull the added (`+`-prefixed) lines from `patch`. If `patch` is truncated for a large file, fall back to fetching the whole file at the PR's head ref: + +```bash +gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/?ref=" \ + --jq '.content' | base64 -d +``` + +## Step 4: Extract RFC 2119 requirements + +Walk the added lines and identify sentences containing the keywords: **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **MAY**, **OPTIONAL**. + +**Quote the whole sentence**, not just the matched line. The matched word may sit inside a bullet point whose lead-in sentence supplies the keyword by inheritance — e.g.: + +> Servers SHOULD return standard JSON-RPC errors for common failure cases: +> +> - Resource not found: -32602 (Invalid Params) + +The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/seps/sep-2164.yaml` for the canonical example. + +**Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in. + +## Step 5: Map severity → check vs. excluded + +From `AGENTS.md:50-56`: + +| Keyword | Severity | YAML field | +| ---------------------------------------------- | -------- | -------------------------- | +| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE | `check: sep--` | +| SHOULD / SHOULD NOT | WARNING | `check: sep--` | +| MAY / OPTIONAL | — | _no row — skip entirely_ | + +MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row. + +A row is `excluded:` when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do **not** write any `excluded:` row on your own authority — every exclusion goes through Step 6. + +While classifying, sort each MUST/SHOULD row into one of three buckets: + +- **`check:`** — observably testable on the wire. +- **clearly-excluded** — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers). +- **borderline** — you'd default to `check:` but observability is questionable. Markers: + - _Internal state_ — verbs like _record_, _store_, _associate_, _track_, _cache_. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list. + - _UI / human-facing_ — _display_, _show_, _render_, _prompt the user_. + - _Precondition phrasing_ — "Before doing X, the implementation MUST Y" where X is itself another row. + +Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`). + +## Step 6: Confirm exclusions with the user + +Nothing becomes `excluded:` without sign-off. Two rounds: + +**Round 1 — clearly-excluded, single batch question.** One `AskUserQuestion` listing all clearly-excluded rows in the question body (slug + one-line reason each). Options: + +- `Exclude all as listed (Recommended)` +- `Flip all to check:` +- `Let me adjust per-row` — if chosen, append these rows to round 2. + +Skip this round if the bucket is empty. + +**Round 2 — borderline, one question per row.** One `AskUserQuestion` call with a question per borderline row (loop in batches of 4 if needed). For each: + +- header: the proposed slug +- question: quote the requirement sentence + your one-line observability concern +- options (list `check:` first — it's the default for borderline): + - `check:` — keep as a testable check + - `excluded: ` — drop to excluded with your stated reason + - `merge into ` — offer when the row is a precondition for another row already in the list + +Apply the answers before writing. For any `excluded:` outcome, write the reason verbatim into the yaml and add an `issue:` URL if the user supplies one. A `merge` outcome means: drop this row, and append its `text:` to the surviving row's `text:` separated by `/` so the traceability isn't lost. + +## Step 7: Rewrite the YAML + +Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/seps/sep-2164.yaml`). + +**Key order within each row** — for `check:` rows the **`check:` key comes first**, then `text:`, then any optional `url:`. Scanning the left margin should reveal every check ID without reading the quoted sentences. For `excluded:` rows the order is **`text:` first**, then `excluded:`, then optional `issue:` — there's no ID to scan for, so lead with the requirement. + +**Row order in the file** — all `check:` rows first (in spec-diff order), then **all `excluded:` rows grouped at the bottom**, separated from the checks by **one blank line**. Do not interleave. + +```yaml +requirements: + - check: sep-NNNN-first-slug + text: '...' + - check: sep-NNNN-second-slug + text: '...' + + - text: '...' + excluded: 'reason' + issue: https://github.com/modelcontextprotocol/conformance/issues/ +``` + +If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written. + +Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requirements you extracted live under a specific spec subsection (e.g. `#error-handling`), append it. + +If a requirement comes from a **different spec page** than `spec_url` (the SEP touched multiple `.mdx` files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full `url:` override: + +```yaml +- check: sep-NNNN-slug + text: '...' + url: https://modelcontextprotocol.io/specification/draft/other/page#anchor +``` + +A row's effective spec reference is `row.url ?? file.spec_url`. + +Write the result back to `$YAML`. + +## Step 8: Suggest a host scenario + +`AGENTS.md` prefers **fewer scenarios with more checks** over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into. + +Determine the suite directory from the requirement subjects ("MCP clients MUST…" → `client/`, "Servers MUST…" → `server/`, "authorization servers MUST…" → `authorization-server/`; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area: + +```bash +rg -l -i '|' src/scenarios// --type ts +``` + +Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: `metadata`, `well-known`; for an auth-response SEP: `redirect`, `callback`, `pkce`). For each hit, pull the scenario's `name`/`description` to confirm relevance: + +```bash +rg -A1 'name:|description:' +``` + +If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call. + +## Step 9: Hand-off + +Report to the user, in this order: + +1. Path to the generated yaml. +2. Row counts: "`N check:` rows, `M excluded:` rows" — and note which exclusions the user signed off in Step 6. +3. Any requirements you left as `TODO:` and why. +4. **Host-scenario recommendation** from Step 8 — either "consider adding these checks to `src/scenarios//.ts` (it already exercises _X_)" or "no existing scenario covers this area; a new file is appropriate". +5. Remaining next steps the user owns: + - add the checks to the host scenario (or create one) under `src/scenarios/{client,server,authorization-server}/`, + - register any new scenario in `src/scenarios/index.ts` (`AGENTS.md:48`), + - add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`. + +Do **not** generate or edit scenario `.ts` files or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml plus the recommendation. diff --git a/AGENTS.md b/AGENTS.md index 1d8aef1e..fc864ba5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -71,6 +71,16 @@ Verify requirement levels against the SEP's **spec diff** — the change to `doc gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/docs/specification/draft/?ref=" --jq '.content' | base64 -d ``` +### Adding a new SEP + +Scaffold the requirement-traceability YAML with: + +```sh +npx @modelcontextprotocol/conformance new-sep +``` + +The command looks up PR #`` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, and writes `src/seps/sep-.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows. + ## Examples: prove it passes and fails A new scenario should come with: diff --git a/src/index.ts b/src/index.ts index 64d1eaaa..013f44c4 100644 --- a/src/index.ts +++ b/src/index.ts @@ -45,6 +45,7 @@ import { printBaselineResults } from './expected-failures'; import { createTierCheckCommand } from './tier-check'; +import { createNewSepCommand } from './new-sep'; import packageJson from '../package.json'; // Note on naming: `command` refers to which CLI command is calling this. @@ -540,6 +541,9 @@ program // Tier check command program.addCommand(createTierCheckCommand()); +// New SEP scaffolding command +program.addCommand(createNewSepCommand()); + // List scenarios command program .command('list') diff --git a/src/new-sep/index.test.ts b/src/new-sep/index.test.ts new file mode 100644 index 00000000..acc01bb7 --- /dev/null +++ b/src/new-sep/index.test.ts @@ -0,0 +1,115 @@ +import { describe, it, expect } from 'vitest'; +import { specPathToUrl, renderYaml } from './index'; + +describe('specPathToUrl', () => { + it('strips the docs/specification/draft/ prefix and .mdx suffix', () => { + expect(specPathToUrl('docs/specification/draft/server/resources.mdx')).toBe( + 'https://modelcontextprotocol.io/specification/draft/server/resources' + ); + }); + + it('handles nested paths', () => { + expect(specPathToUrl('docs/specification/draft/basic/lifecycle.mdx')).toBe( + 'https://modelcontextprotocol.io/specification/draft/basic/lifecycle' + ); + }); + + it('rejects paths outside docs/specification/draft/', () => { + expect(() => + specPathToUrl('docs/specification/2025-11-25/server/x.mdx') + ).toThrow(/must start with/); + }); +}); + +describe('renderYaml', () => { + it('emits placeholder yaml in the sep-2164.yaml style', () => { + const out = renderYaml({ + sep: 9999, + specUrl: + 'https://modelcontextprotocol.io/specification/draft/server/resources' + }); + expect(out).toBe( + `sep: 9999 +spec_url: https://modelcontextprotocol.io/specification/draft/server/resources +requirements: + - check: sep-9999-todo + text: 'TODO: quote the normative sentence from the spec diff' + + - text: 'TODO: requirement that cannot be tested' + excluded: 'TODO: reason' + issue: https://github.com/modelcontextprotocol/conformance/issues/ +` + ); + }); + + it('matches the byte-shape of the real sep-2164.yaml when given its rows', () => { + const out = renderYaml({ + sep: 2164, + specUrl: + 'https://modelcontextprotocol.io/specification/draft/server/resources#error-handling', + requirements: [ + { + text: 'Servers MUST NOT return an empty contents array for a non-existent resource', + check: 'sep-2164-no-empty-contents' + }, + { + text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)', + check: 'sep-2164-error-code' + }, + { + text: 'clients SHOULD also accept -32002 as a resource not found error', + excluded: + 'Client-side error handling is implementation-defined; not protocol-observable' + } + ] + }); + expect(out).toBe( + `sep: 2164 +spec_url: https://modelcontextprotocol.io/specification/draft/server/resources#error-handling +requirements: + - check: sep-2164-no-empty-contents + text: 'Servers MUST NOT return an empty contents array for a non-existent resource' + - check: sep-2164-error-code + text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)' + + - text: 'clients SHOULD also accept -32002 as a resource not found error' + excluded: 'Client-side error handling is implementation-defined; not protocol-observable' +` + ); + }); + + it('emits per-row url: overrides', () => { + const out = renderYaml({ + sep: 1234, + specUrl: 'https://modelcontextprotocol.io/specification/draft/server/a', + requirements: [ + { text: 'from primary file', check: 'sep-1234-a' }, + { + text: 'from secondary file', + check: 'sep-1234-b', + url: 'https://modelcontextprotocol.io/specification/draft/server/b#x' + } + ] + }); + expect(out).toBe( + `sep: 1234 +spec_url: https://modelcontextprotocol.io/specification/draft/server/a +requirements: + - check: sep-1234-a + text: 'from primary file' + - check: sep-1234-b + text: 'from secondary file' + url: https://modelcontextprotocol.io/specification/draft/server/b#x +` + ); + }); + + it('escapes single quotes by doubling them', () => { + const out = renderYaml({ + sep: 1, + specUrl: 'https://example.com/x', + requirements: [{ text: "can't happen", check: 'sep-1-x' }] + }); + expect(out).toContain("text: 'can''t happen'"); + }); +}); diff --git a/src/new-sep/index.ts b/src/new-sep/index.ts new file mode 100644 index 00000000..aa50a3a6 --- /dev/null +++ b/src/new-sep/index.ts @@ -0,0 +1,256 @@ +import { Command } from 'commander'; +import { Octokit } from '@octokit/rest'; +import { promises as fs } from 'fs'; +import path from 'path'; + +export interface RequirementRow { + text: string; + check?: string; + excluded?: string; + issue?: string; + /** Full spec URL for this requirement; overrides the file-level spec_url. */ + url?: string; +} + +const OUT_DIR = 'src/seps'; +const SPEC_PATH_PREFIX = 'docs/specification/draft/'; +const DEFAULT_SPEC_REPO = 'modelcontextprotocol/modelcontextprotocol'; + +export function specPathToUrl(specPath: string): string { + if (!specPath.startsWith(SPEC_PATH_PREFIX)) { + throw new Error( + `spec path must start with "${SPEC_PATH_PREFIX}"; got: ${specPath}` + ); + } + const rest = specPath.slice(SPEC_PATH_PREFIX.length).replace(/\.mdx$/, ''); + return `https://modelcontextprotocol.io/specification/draft/${rest}`; +} + +function escapeSingleQuoted(s: string): string { + return s.replace(/'/g, "''"); +} + +function defaultPlaceholderRequirements(sep: number): RequirementRow[] { + return [ + { + text: 'TODO: quote the normative sentence from the spec diff', + check: `sep-${sep}-todo` + }, + { + text: 'TODO: requirement that cannot be tested', + excluded: 'TODO: reason', + issue: 'https://github.com/modelcontextprotocol/conformance/issues/' + } + ]; +} + +export function renderYaml(input: { + sep: number; + specUrl: string; + requirements?: RequirementRow[]; +}): string { + const reqs = input.requirements ?? defaultPlaceholderRequirements(input.sep); + const checkRows = reqs.filter((r) => r.check); + const excludedRows = reqs.filter((r) => !r.check); + + const lines: string[] = []; + lines.push(`sep: ${input.sep}`); + lines.push(`spec_url: ${input.specUrl}`); + lines.push('requirements:'); + + for (const r of checkRows) { + lines.push(` - check: ${r.check}`); + lines.push(` text: '${escapeSingleQuoted(r.text)}'`); + if (r.url) lines.push(` url: ${r.url}`); + } + + if (checkRows.length > 0 && excludedRows.length > 0) { + lines.push(''); + } + + for (const r of excludedRows) { + lines.push(` - text: '${escapeSingleQuoted(r.text)}'`); + if (r.excluded) { + lines.push(` excluded: '${escapeSingleQuoted(r.excluded)}'`); + } + if (r.issue) lines.push(` issue: ${r.issue}`); + } + + return lines.join('\n') + '\n'; +} + +async function resolveToken(explicit?: string): Promise { + let token = explicit || process.env.GITHUB_TOKEN; + if (!token) { + try { + const { execSync } = await import('child_process'); + token = execSync('gh auth token', { encoding: 'utf-8' }).trim(); + } catch { + // gh not installed or not authenticated + } + } + return token; +} + +interface SpecCandidate { + filename: string; + additions: number; +} + +async function lookupSpecPath(args: { + sep: number; + repo: string; + token: string; +}): Promise { + const [owner, repoName] = args.repo.split('/'); + if (!owner || !repoName) { + throw new Error(`Invalid --repo: ${args.repo} (expected owner/repo)`); + } + const octokit = new Octokit({ auth: args.token }); + + // SEP numbers are PR numbers in the spec repo by convention. + const prNumber = args.sep; + + const files = await octokit.paginate(octokit.pulls.listFiles, { + owner, + repo: repoName, + pull_number: prNumber, + per_page: 100 + }); + + const candidates: SpecCandidate[] = files + .filter( + (f) => + f.filename.startsWith(SPEC_PATH_PREFIX) && f.filename.endsWith('.mdx') + ) + .map((f) => ({ filename: f.filename, additions: f.additions })); + + if (candidates.length === 0) { + throw new Error( + `PR #${prNumber} in ${args.repo} does not change any ` + + `${SPEC_PATH_PREFIX}*.mdx file. Pass --spec-path to override.` + ); + } + candidates.sort((a, b) => b.additions - a.additions); + return candidates.map((c) => c.filename); +} + +export function createNewSepCommand(): Command { + return new Command('new-sep') + .description( + 'Scaffold a sep-NNNN.yaml requirement-traceability file for a new SEP' + ) + .argument('', 'SEP number, e.g. 2164') + .option( + '--spec-url ', + 'Use this spec URL verbatim (skips GitHub lookup)' + ) + .option( + '--spec-path ', + `${SPEC_PATH_PREFIX}... path to derive spec_url from (skips GitHub lookup)` + ) + .option( + '--repo ', + 'Spec repo to query for the SEP PR', + DEFAULT_SPEC_REPO + ) + .option( + '--token ', + 'GitHub token (defaults to GITHUB_TOKEN env or `gh auth token`)' + ) + .option('--force', 'Overwrite existing sep-NNNN.yaml') + .action(async (sepArg: string, options) => { + const sep = parseInt(sepArg, 10); + if (!Number.isFinite(sep) || sep <= 0 || String(sep) !== sepArg.trim()) { + console.error(`Invalid SEP number: ${sepArg}`); + process.exit(1); + } + + let specUrl: string | undefined = options.specUrl; + let specPath: string | undefined = options.specPath; + let otherSpecPaths: string[] = []; + + if (!specUrl && !specPath) { + const token = await resolveToken(options.token); + if (!token) { + console.error( + 'GitHub token required to look up the SEP PR. Either:\n' + + ' gh auth login\n' + + ' export GITHUB_TOKEN=$(gh auth token)\n' + + ' or pass --token \n' + + ' or pass --spec-url / --spec-path to skip the lookup' + ); + process.exit(1); + } + try { + const specPaths = await lookupSpecPath({ + sep, + repo: options.repo, + token + }); + specPath = specPaths[0]; + otherSpecPaths = specPaths.slice(1); + console.error(`Resolved spec path: ${specPath}`); + } catch (err) { + console.error((err as Error).message); + process.exit(1); + } + } + + if (specPath && !specUrl) { + try { + specUrl = specPathToUrl(specPath); + } catch (err) { + console.error((err as Error).message); + process.exit(1); + } + } + if (!specUrl) { + console.error('Could not resolve spec_url. Internal error.'); + process.exit(1); + } + + const outPath = path.join(OUT_DIR, `sep-${sep}.yaml`); + + await fs.mkdir(OUT_DIR, { recursive: true }); + + if (!options.force) { + try { + await fs.access(outPath); + console.error( + `${outPath} already exists. Pass --force to overwrite.` + ); + process.exit(1); + } catch { + // does not exist, OK + } + } + + const yaml = renderYaml({ sep, specUrl }); + await fs.writeFile(outPath, yaml, 'utf-8'); + + console.error(`Wrote ${outPath}`); + if (otherSpecPaths.length > 0) { + console.error( + `Note: PR also changes ${otherSpecPaths.length} other spec file(s):` + ); + for (const p of otherSpecPaths) { + console.error(` ${specPathToUrl(p)}`); + } + console.error( + ` Use a per-row "url:" for requirements from those files.` + ); + } + console.error('Next steps:'); + console.error( + ' 1. Edit the file to quote real normative sentences from the spec diff' + ); + console.error( + ' (and add a "#anchor" to spec_url if the requirement lives in a subsection)' + ); + console.error(' 2. Implement the TypeScript scenario'); + console.error( + ' 3. Register it in the appropriate suite list in src/scenarios/index.ts' + ); + }); +} diff --git a/src/scenarios/server/sep-2164.yaml b/src/seps/sep-2164.yaml similarity index 65% rename from src/scenarios/server/sep-2164.yaml rename to src/seps/sep-2164.yaml index 051253d4..83c11ce9 100644 --- a/src/scenarios/server/sep-2164.yaml +++ b/src/seps/sep-2164.yaml @@ -1,9 +1,10 @@ sep: 2164 spec_url: https://modelcontextprotocol.io/specification/draft/server/resources#error-handling requirements: - - text: 'Servers MUST NOT return an empty contents array for a non-existent resource' - check: sep-2164-no-empty-contents - - text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)' - check: sep-2164-error-code + - check: sep-2164-no-empty-contents + text: 'Servers MUST NOT return an empty contents array for a non-existent resource' + - check: sep-2164-error-code + text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)' + - text: 'clients SHOULD also accept -32002 as a resource not found error' excluded: 'Client-side error handling is implementation-defined; not protocol-observable'