Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions .claude/skills/new-sep/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
name: new-sep
description: >-
Scaffold a sep-NNNN.yaml requirement-traceability file for the MCP
conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then
parses the modelcontextprotocol/modelcontextprotocol spec diff to populate
`requirements[]` with the RFC 2119 sentences and proposed check IDs.
argument-hint: '<sep-number>'
---

# new-sep: SEP traceability YAML scaffolding

You are bootstrapping a `sep-NNNN.yaml` file for a new SEP in the MCP conformance repo. The output is the requirement-traceability file specified by SEP-2484: a YAML that maps each normative sentence from the SEP's spec diff to a `check:` ID (testable) or an `excluded:` reason (not testable). The CLI gets the skeleton; you fill in the rows by reading the spec diff.

## Step 0: Pre-flight checks

Before doing anything else, verify GitHub CLI authentication:

```bash
gh auth status 2>&1
```

If this fails, stop immediately and tell the user:

> GitHub authentication is required for this skill. Please run `gh auth login` first, then re-run.

Verify you're running inside the conformance repo:

```bash
test -f package.json && jq -r '.name' package.json
```

The name should be `@modelcontextprotocol/conformance`. If not, stop and ask the user to `cd` into the conformance repo first.

## Step 1: Parse arguments

Extract from the user's input:

- **sep-number** (required): the SEP number, e.g. `2164`. This is also the PR number in `modelcontextprotocol/modelcontextprotocol` by convention.

## Step 2: Generate the skeleton

Run the CLI:

```bash
npm run --silent build
node dist/index.js new-sep <NNNN>
```

(For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.)

The CLI writes `src/seps/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.

If the CLI errors with "does not change any docs/specification/draft/\*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with `--spec-path docs/specification/draft/<path>`. Do not guess.

## Step 3: Fetch the spec diff

`AGENTS.md` (lines 64–72) is explicit that severity must come from the spec text itself, not the SEP markdown or the conformance PR description:

```bash
gh api "repos/modelcontextprotocol/modelcontextprotocol/pulls/<NNNN>/files" \
--jq '.[] | select(.filename | test("^docs/specification/draft/.*\\.mdx$")) | {filename, patch}'
```

For each file, pull the added (`+`-prefixed) lines from `patch`. If `patch` is truncated for a large file, fall back to fetching the whole file at the PR's head ref:

```bash
gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/<path>?ref=<sep-branch>" \
--jq '.content' | base64 -d
```

## Step 4: Extract RFC 2119 requirements

Walk the added lines and identify sentences containing the keywords: **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **MAY**, **OPTIONAL**.

**Quote the whole sentence**, not just the matched line. The matched word may sit inside a bullet point whose lead-in sentence supplies the keyword by inheritance — e.g.:

> Servers SHOULD return standard JSON-RPC errors for common failure cases:
>
> - Resource not found: -32602 (Invalid Params)

The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/seps/sep-2164.yaml` for the canonical example.

**Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in.

## Step 5: Map severity → check vs. excluded

From `AGENTS.md:50-56`:

| Keyword | Severity | YAML field |
| ---------------------------------------------- | -------- | -------------------------- |
| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE | `check: sep-<NNNN>-<slug>` |
| SHOULD / SHOULD NOT | WARNING | `check: sep-<NNNN>-<slug>` |
| MAY / OPTIONAL | — | _no row — skip entirely_ |

MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row.

A row is `excluded:` when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do **not** write any `excluded:` row on your own authority — every exclusion goes through Step 6.

While classifying, sort each MUST/SHOULD row into one of three buckets:

- **`check:`** — observably testable on the wire.
- **clearly-excluded** — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers).
- **borderline** — you'd default to `check:` but observability is questionable. Markers:
- _Internal state_ — verbs like _record_, _store_, _associate_, _track_, _cache_. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list.
- _UI / human-facing_ — _display_, _show_, _render_, _prompt the user_.
- _Precondition phrasing_ — "Before doing X, the implementation MUST Y" where X is itself another row.

Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`).

## Step 6: Confirm exclusions with the user

Nothing becomes `excluded:` without sign-off. Two rounds:

**Round 1 — clearly-excluded, single batch question.** One `AskUserQuestion` listing all clearly-excluded rows in the question body (slug + one-line reason each). Options:

- `Exclude all as listed (Recommended)`
- `Flip all to check:`
- `Let me adjust per-row` — if chosen, append these rows to round 2.

Skip this round if the bucket is empty.

**Round 2 — borderline, one question per row.** One `AskUserQuestion` call with a question per borderline row (loop in batches of 4 if needed). For each:

- header: the proposed slug
- question: quote the requirement sentence + your one-line observability concern
- options (list `check:` first — it's the default for borderline):
- `check:` — keep as a testable check
- `excluded: <reason>` — drop to excluded with your stated reason
- `merge into <other-slug>` — offer when the row is a precondition for another row already in the list

Apply the answers before writing. For any `excluded:` outcome, write the reason verbatim into the yaml and add an `issue:` URL if the user supplies one. A `merge` outcome means: drop this row, and append its `text:` to the surviving row's `text:` separated by `/` so the traceability isn't lost.

## Step 7: Rewrite the YAML

Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/seps/sep-2164.yaml`).

**Key order within each row** — for `check:` rows the **`check:` key comes first**, then `text:`, then any optional `url:`. Scanning the left margin should reveal every check ID without reading the quoted sentences. For `excluded:` rows the order is **`text:` first**, then `excluded:`, then optional `issue:` — there's no ID to scan for, so lead with the requirement.

**Row order in the file** — all `check:` rows first (in spec-diff order), then **all `excluded:` rows grouped at the bottom**, separated from the checks by **one blank line**. Do not interleave.

```yaml
requirements:
- check: sep-NNNN-first-slug
text: '...'
- check: sep-NNNN-second-slug
text: '...'

- text: '...'
excluded: 'reason'
issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
```

If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written.

Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requirements you extracted live under a specific spec subsection (e.g. `#error-handling`), append it.

If a requirement comes from a **different spec page** than `spec_url` (the SEP touched multiple `.mdx` files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full `url:` override:

```yaml
- check: sep-NNNN-slug
text: '...'
url: https://modelcontextprotocol.io/specification/draft/other/page#anchor
```

A row's effective spec reference is `row.url ?? file.spec_url`.

Write the result back to `$YAML`.

## Step 8: Suggest a host scenario

`AGENTS.md` prefers **fewer scenarios with more checks** over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into.

Determine the suite directory from the requirement subjects ("MCP clients MUST…" → `client/`, "Servers MUST…" → `server/`, "authorization servers MUST…" → `authorization-server/`; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area:

```bash
rg -l -i '<domain-term>|<domain-term-2>' src/scenarios/<suite>/ --type ts
```

Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: `metadata`, `well-known`; for an auth-response SEP: `redirect`, `callback`, `pkce`). For each hit, pull the scenario's `name`/`description` to confirm relevance:

```bash
rg -A1 'name:|description:' <hit.ts>
```

If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call.

## Step 9: Hand-off

Report to the user, in this order:

1. Path to the generated yaml.
2. Row counts: "`N check:` rows, `M excluded:` rows" — and note which exclusions the user signed off in Step 6.
3. Any requirements you left as `TODO:` and why.
4. **Host-scenario recommendation** from Step 8 — either "consider adding these checks to `src/scenarios/<suite>/<file>.ts` (it already exercises _X_)" or "no existing scenario covers this area; a new file is appropriate".
5. Remaining next steps the user owns:
- add the checks to the host scenario (or create one) under `src/scenarios/{client,server,authorization-server}/`,
- register any new scenario in `src/scenarios/index.ts` (`AGENTS.md:48`),
- add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`.

Do **not** generate or edit scenario `.ts` files or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml plus the recommendation.
10 changes: 10 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,16 @@ Verify requirement levels against the SEP's **spec diff** — the change to `doc
gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/docs/specification/draft/<path>?ref=<sep-branch>" --jq '.content' | base64 -d
```

### Adding a new SEP

Scaffold the requirement-traceability YAML with:

```sh
npx @modelcontextprotocol/conformance new-sep <NNNN>
```

The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, and writes `src/seps/sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.

## Examples: prove it passes and fails

A new scenario should come with:
Expand Down
4 changes: 4 additions & 0 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ import {
printBaselineResults
} from './expected-failures';
import { createTierCheckCommand } from './tier-check';
import { createNewSepCommand } from './new-sep';
import packageJson from '../package.json';

// Note on naming: `command` refers to which CLI command is calling this.
Expand Down Expand Up @@ -540,6 +541,9 @@ program
// Tier check command
program.addCommand(createTierCheckCommand());

// New SEP scaffolding command
program.addCommand(createNewSepCommand());

// List scenarios command
program
.command('list')
Expand Down
115 changes: 115 additions & 0 deletions src/new-sep/index.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import { describe, it, expect } from 'vitest';
import { specPathToUrl, renderYaml } from './index';

describe('specPathToUrl', () => {
it('strips the docs/specification/draft/ prefix and .mdx suffix', () => {
expect(specPathToUrl('docs/specification/draft/server/resources.mdx')).toBe(
'https://modelcontextprotocol.io/specification/draft/server/resources'
);
});

it('handles nested paths', () => {
expect(specPathToUrl('docs/specification/draft/basic/lifecycle.mdx')).toBe(
'https://modelcontextprotocol.io/specification/draft/basic/lifecycle'
);
});

it('rejects paths outside docs/specification/draft/', () => {
expect(() =>
specPathToUrl('docs/specification/2025-11-25/server/x.mdx')
).toThrow(/must start with/);
});
});

describe('renderYaml', () => {
it('emits placeholder yaml in the sep-2164.yaml style', () => {
const out = renderYaml({
sep: 9999,
specUrl:
'https://modelcontextprotocol.io/specification/draft/server/resources'
});
expect(out).toBe(
`sep: 9999
spec_url: https://modelcontextprotocol.io/specification/draft/server/resources
requirements:
- check: sep-9999-todo
text: 'TODO: quote the normative sentence from the spec diff'

- text: 'TODO: requirement that cannot be tested'
excluded: 'TODO: reason'
issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
`
);
});

it('matches the byte-shape of the real sep-2164.yaml when given its rows', () => {
const out = renderYaml({
sep: 2164,
specUrl:
'https://modelcontextprotocol.io/specification/draft/server/resources#error-handling',
requirements: [
{
text: 'Servers MUST NOT return an empty contents array for a non-existent resource',
check: 'sep-2164-no-empty-contents'
},
{
text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)',
check: 'sep-2164-error-code'
},
{
text: 'clients SHOULD also accept -32002 as a resource not found error',
excluded:
'Client-side error handling is implementation-defined; not protocol-observable'
}
]
});
expect(out).toBe(
`sep: 2164
spec_url: https://modelcontextprotocol.io/specification/draft/server/resources#error-handling
requirements:
- check: sep-2164-no-empty-contents
text: 'Servers MUST NOT return an empty contents array for a non-existent resource'
- check: sep-2164-error-code
text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'

- text: 'clients SHOULD also accept -32002 as a resource not found error'
excluded: 'Client-side error handling is implementation-defined; not protocol-observable'
`
);
});

it('emits per-row url: overrides', () => {
const out = renderYaml({
sep: 1234,
specUrl: 'https://modelcontextprotocol.io/specification/draft/server/a',
requirements: [
{ text: 'from primary file', check: 'sep-1234-a' },
{
text: 'from secondary file',
check: 'sep-1234-b',
url: 'https://modelcontextprotocol.io/specification/draft/server/b#x'
}
]
});
expect(out).toBe(
`sep: 1234
spec_url: https://modelcontextprotocol.io/specification/draft/server/a
requirements:
- check: sep-1234-a
text: 'from primary file'
- check: sep-1234-b
text: 'from secondary file'
url: https://modelcontextprotocol.io/specification/draft/server/b#x
`
);
});

it('escapes single quotes by doubling them', () => {
const out = renderYaml({
sep: 1,
specUrl: 'https://example.com/x',
requirements: [{ text: "can't happen", check: 'sep-1-x' }]
});
expect(out).toContain("text: 'can''t happen'");
});
});
Loading
Loading