[demo] missing_docs drift-watch: resolve oz CLI drift (GA docs + gated deferral)#279
Merged
hongyi-chen merged 2 commits intoJun 30, 2026
Merged
Conversation
…, oz provider) Demonstrates the missing_docs drift-watch loop end to end on real drift. The audit flagged 15 undocumented `oz` subcommands; this resolves all of them according to each command's GA rollout status: - GA (NamedAgents): document the `oz agent` named-agent management group (list/get/create/update/delete + `oz agent skills`) and fix the existing `oz agent list` entry, which incorrectly described skill listing. - GA (ConversationApi): document `oz run conversation get` and the `oz run message` inbox commands (list/read/watch/send/mark-delivered). - Non-GA (ProviderCommand = dogfood): defer the whole `oz provider` group via `gated:ProviderCommand` so it auto-surfaces for docs when it goes GA. All command flags drafted from crates/warp_cli source (agent.rs, task.rs, provider.rs). CLI audit now reports 0 gaps; cli_commands gated_non_ga = 14. Co-Authored-By: Oz <oz-agent@warp.dev>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
|
I'm starting a first review of this pull request. I cancelled the in-progress review run because this pull request was closed. Powered by Oz |
hongyi-chen
added a commit
that referenced
this pull request
Jul 2, 2026
…e detection (#201) * Overhaul missing_docs skill: fix audit blind spots, add change detection - audit_docs.py: fail loud (exit 2 + audits_skipped) when repos are missing; fix repo auto-detection to siblings of the docs repo root - GA detection via the app/src/features.rs cargo-feature bridge plus RELEASE_FLAGS/PREVIEW_FLAGS/DOGFOOD_FLAGS instead of snake_case guessing - New audits: public API routes (router/handlers/public_api gin groups vs OpenAPI spec), CLI subcommands (clap enum tree, hidden skipped), slash commands (static registry), surface-map hygiene (dead entries) - Staleness: strip code spans, word-boundary matching, skip historical changelog pages (73 -> 29 findings; 'oz agent' CLI noise eliminated) - Change detection: references/surface_snapshot.json + --diff mode reporting added/removed/promoted surfaces and changelog items since last run; --update-snapshot regenerates the baseline - Seed feature_surface_map.md: map handoff/orchestration/queueing/BYOK/ billing/etc. flags, prune 45+ dead entries, add slash-command section and API internal sentinels - SKILL.md: document the exit-code contract, diff workflow, drift-watch mode for the recurring agent (with copyable scheduled-agent prompt), and make surface-map + snapshot updates explicit drafting steps Co-Authored-By: Oz <oz-agent@warp.dev> * Expand missing_docs audit: settings, web app, tools, skills, structure, stale refs - Settings audit: parse the define_setting! toml_path registry (~200 settings, flag-status aware, object-typed settings handled) and check coverage in the all-settings reference; reverse check catches documented settings that were renamed/removed in code (e.g. agents.oz.* -> agents.warp_agent.*) - Stale doc references: validate documented keybinding actions (scope:action) still exist anywhere in warp-internal source - Docs structure audit: flag pages missing from src/sidebar.ts (with an allowlist section in the surface map) - CLI: recursive subcommand parsing (oz run message send, oz environment image list, ...) plus per-module --flag tracking in the snapshot - API: positional RouterGroup argument resolution at Register* call sites (fixes oauth route prefixes) and param-name-insensitive OpenAPI matching - Snapshot v2: settings, web app routes (AgentsApp.tsx), server-side agent tools (ToolName consts + Create*NativeTool registrations), bundled + channel-gated skills, CLI flags; graceful one-time note when diffing against a v1 snapshot - Changelog cross-check now also tracks 'Oz updates' bullets - Extraction sanity guards: implausibly low parse counts (broken parser after a code-layout change) skip dependent audits and exit 2 instead of silently under-reporting; map hygiene and reverse checks gated on healthy extraction - Feature flag enum parsing is brace-safe (survives future struct variants) - SKILL.md documents the 9 coverage audits, snapshot-only surfaces, and adjacent-skill ownership (validate_ui_refs, sync-error-docs, style_lint, weekly-404-monitor) Co-Authored-By: Oz <oz-agent@warp.dev> * Add completeness accounting and map integrity checks; reclassify GA flags Triple-check that every feature is encapsulated in the mapping: - Built-in completeness accounting on every full run: partitions every extracted surface item (277 flags, 74 CLI commands, 71 API routes, 47 slash commands, 201 settings) into exactly one accountability bucket (mapped / ignored / doc-covered / visible finding / snapshot-tracked) and exits 2 with integrity:accounting if anything escapes — an unaccounted item can only mean the audit logic regressed - Map integrity checks in hygiene: entries in both the mapping and the ignore list (ignore silently wins) and duplicate keys within a section are now medium findings - Ignore-list review against computed statuses found ~20 flags filed under 'Non-GA' that have since GA'd: reclassified 15 user-facing ones to real mappings (session sharing trio, AgentHarness, SshRemoteServer, ArtifactCommand, OzIdentityFederation, image-context pair, OzPlatformSkills, WorkflowAliases, ShellSelector, KittyImages, UndoClosedPanes, RevertDiffHunk), surfaced FullScreenZenMode as a visible undocumented-feature finding, and retitled the section so placement no longer asserts rollout status - Re-baselined the snapshot after live drift the diff caught on today's checkouts (SuperGrok dogfood->ga, new /rename-conversation slash command — both now standing coverage findings) - SKILL.md: documented the accounting contract and the end-to-end 'how every change path is caught' chain (new/promoted/removed surfaces, no-code-change launches via changelog net, parser rot via extraction guards, map rot via hygiene) Co-Authored-By: Oz <oz-agent@warp.dev> * Target the public warp repo instead of warp-internal Warp's client code is open source at warpdotdev/warp — the audit now treats the public repo as the primary source: - Repo auto-detection prefers a sibling checkout named 'warp' and falls back to 'warp-internal' for transitional environments - New --warp flag is the primary CLI option; --warp-internal remains as a deprecated alias (same destination) so existing invocations keep working - All docstrings, stderr messages, skip reasons, section headers, and SKILL.md guidance (requirements, audit descriptions, drift-watch command, scheduled-agent prompt) now reference the public warp client repo Validated: auto-detect fallback, preferred 'warp' sibling resolution, explicit --warp, and the deprecated alias all run the full audit with clean completeness accounting. Co-Authored-By: Oz <oz-agent@warp.dev> * First drift-watch burn-down: settings, keybindings, slash commands, SuperGrok docs (#203) * First drift-watch burn-down: settings, keybindings, slash commands, SuperGrok Dogfoods the missing_docs drift-watch workflow on the standing findings backlog (41 findings resolved): - all-settings.mdx: documented 17 missing settings (prompt submission mode, orchestration message display, auto handoff on sleep, agent attribution, handoff kill-switches, OSC 52 clipboard access, async find, directory tab colors, vertical-tabs panel options, hidden files in project explorer, line number mode, force X11, Ctrl+Enter submit for CLI agents, input focus on block selection) with types/defaults/options extracted from the settings registry; moved git_operations_autogen_enabled into [agents.warp_agent.active_ai] reflecting the agents.oz -> agents.warp_agent rename and removed the stale remnant section; added an Experimental section - keyboard-shortcuts.mdx: fixed all 14 dead action names (10 renames like workspace:open_new_tab -> workspace:new_tab, editor_view:cmd_i -> editor_view:inspect_command, terminal:trigger_subshell_bootstrap -> terminal:warpify_subshell; 4 removed actions blanked) - slash-commands.mdx: added /environment, /harness, /host (cloud agent session selectors) and /rename-conversation - bring-your-own-api-key.mdx: documented connecting a SuperGrok subscription instead of an xAI API key (newly GA SuperGrok flag) + map entry - Surface map: allowlisted guides/agent-workflows/warp-vs-claude-code as intentionally unlisted (per its frontmatter note) - Re-applied the GA-flag reclassification from the previous session (15 mappings + FullScreenZenMode surfaced) which had been silently reverted by a 'git checkout' during integrity testing before it was committed \u2014 caught by comparing accounting bucket counts run-over-run - Snapshot re-baselined; audit now reports 0 stale doc references, 0 undocumented settings/slash commands/unlisted pages; remaining backlog: 29 CLI subcommand docs, 24 OpenAPI spec gaps (update-open-api-spec), 29 terminology pages (style_lint), FullScreenZenMode + GroupedTabs Co-Authored-By: Oz <oz-agent@warp.dev> * docs: second drift-watch pass — settings, feature flags, map hygiene Re-ran the missing_docs drift-watch audit against current code surfaces and burned down the newly-found in-scope drift (features 5→0, settings 6→0, map hygiene 2→0). Settings (all-settings.mdx): - Document appearance.icon.show_dock_icon (macOS Dock / Cmd-Tab visibility) - Document agents.warp_agent.other.long_running_command_submission_mode - Document code.editor.format_on_save - Document cloud_platform.third_party_api_keys.gemini_enterprise_credentials_enabled - Document warpify.ssh.reuse_existing_control_master - Map warpify.ssh.ssh_tmux_deprecation_notice_pending -> internal (one-time migration banner state, not user-configurable) Feature flags (feature_surface_map.md): - Map CodexPlugin -> cli-agents/codex.md - Map FullScreenZenMode, AsyncFind -> all-settings.mdx (surfaces are documented settings) - Map CustomModelRouters -> inference/model-choice.mdx (new "Custom routers" section) - Ignore GroupedTabs (macOS-only Preview; docs pending GA promotion) Map hygiene: - Prune stale ignore-list flags FreeUserNoAi and WelcomeTab (no longer in code) Co-Authored-By: Oz <oz-agent@warp.dev> * docs(missing_docs): codify finding-resolution patterns in SKILL.md Add a "Resolution patterns" subsection capturing the per-type decision rules applied during the second drift-watch pass, so recurring runs resolve findings consistently: - user-facing setting -> document in all-settings - internal/state-only setting -> map `section.key -> internal` - feature flag with a dedicated page -> map to it - feature flag whose only surface is a documented setting -> map to that page - preview/pre-launch feature with no docs -> ignore-list with a comment - stale map entry/doc reference -> prune after confirming removal in code Co-Authored-By: Oz <oz-agent@warp.dev> * Update .agents/skills/missing_docs/references/feature_surface_map.md Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com> --------- Co-authored-by: Oz <oz-agent@warp.dev> Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com> * docs(missing_docs): add reviewer routing from code ownership Add scripts/suggest_reviewers.py, which maps the source surface behind each drift-watch finding to its owning engineer using the CODEOWNERS-format ownership files already maintained in the code repos: - warp client: .github/STAKEHOLDERS - warp-server: .github/STAKEHOLDERS (advisory) + .github/CODEOWNERS (enforced) It resolves with standard last-match-wins precedence, dedupes owners into users/teams, and prints a ready-to-run `gh pr edit --add-reviewer` command. Unresolved paths are non-fatal so a scheduled run is never blocked. Wire it into SKILL.md: a new "Reviewer routing" section (per-category source-file hints), a drift-watch "Route reviewers" step before opening the PR, an updated scheduled-agent prompt, and a References entry. Ownership stays sourced from STAKEHOLDERS/CODEOWNERS (kept fresh by sync-stakeholders) — never hardcoded. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(missing_docs): address review comments - SKILL.md: Phase 3 step 7 now points to src/sidebar.ts (the real sidebar source the structure audit checks); astro.config.mjs only for a new top-level topic. - all-settings.mdx: correct code.editor.show_hidden_files default to `true` (matches the client setting definition). - surface_snapshot.json: regenerate so it includes the newly documented settings (code.editor.format_on_save, appearance.icon.show_dock_icon, agents.warp_agent.other.long_running_command_submission_mode, warpify.ssh.reuse_existing_control_master, cloud_platform.third_party_api_keys.gemini_enterprise_credentials_enabled) and current flag/CLI/API state. Audit: exit 0, audits_skipped none, unaccounted none. Co-Authored-By: Oz <oz-agent@warp.dev> * demo(missing_docs): CLI drift burn-down sample (api-key / schedule / secret) (#278) Sample output of one missing_docs drift-watch pass over the CLI backlog, to evaluate the skill's efficacy. Resolves 14 of 31 undocumented CLI commands: - Draft: document the `oz api-key list/create/expire` subcommands in the CLI reference (reference/cli/api-keys.mdx), with flags/args extracted from crates/warp_cli/src/api_key.rs. - Map (no duplication): `oz schedule *` and `oz secret *` subcommands are already documented in their feature pages, so add surface-map entries pointing there instead of re-drafting. Reviewer routing (suggest_reviewers.py): crates/warp_cli ownership -> @bnavetta, @ianhodge. Audit after: CLI findings 31 -> 17, exit 0, unaccounted none. Co-authored-by: Oz <oz-agent@warp.dev> * test(missing_docs): add stdlib test suite for the skill scripts - test_suggest_reviewers.py: 15 unit tests for reviewer resolution (CODEOWNERS matching incl. anchored dir-prefix / exact-file / glob / default rule, last-match-wins precedence, user vs team split, dedup, unresolved paths, warp-internal alias, stdin) — all via temp ownership files. - test_audit_docs.py: 6 behavioral integration tests that run audit_docs.py against the sibling code repos — clean exit + completeness accounting (unaccounted empty), --category scoping, --severity filtering, fail-loud (exit 2) on a missing repo, committed-snapshot currency, and that --update-snapshot honors --snapshot without mutating the committed snapshot. Skips gracefully when the code repos aren't checked out. Both suites use only the Python stdlib (unittest) — no third-party deps. 21/21 pass. Documented under a new "## Tests" section in SKILL.md. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(missing_docs): encode public vs. private surface boundary Make explicit that the skill must only document publicly released surfaces: - New "Public vs. private surfaces" section in SKILL.md: the OSS warp client repo is public; warp-server is a PRIVATE repo whose only public surface is the released Oz Agent API already in the OpenAPI spec. Two gates (source/exposure + GA rollout); never document private or unreleased surfaces. - Woven into the API audit description, Phase 3 API-gap research, and Resolution patterns: warp-server endpoints not in the released spec are not auto- documentable — route released ones via sync-openapi-spec, else `-> internal`/defer. Apply the rule to detected drift: Agent Memory is research preview, so its `oz memory*` / `oz memory-store*` CLI and `/memory_stores/*` REST API are mapped `-> internal` with comments. Added a public/private POLICY note to the surface map's API section. Audit after: CLI 17->15, API 29->18, audits_skipped none, unaccounted none. Co-Authored-By: Oz <oz-agent@warp.dev> * test(missing_docs): cover public/private boundary + run skill tests in CI - test_audit_docs.py: add test_research_preview_surfaces_are_deferred, a regression guard asserting Agent Memory's CLI (`oz memory*`) and REST API (`/memory_stores/*`) are never flagged for documentation (public/private boundary). 22 tests total, all green. - ci.yml: run the skill's stdlib test suites on every PR. The reviewer-resolver unit tests run fully; the audit integration tests skip gracefully since the warp/warp-server code repos aren't checked out in docs CI. - SKILL.md: note the new boundary test in the Tests section. Co-Authored-By: Oz <oz-agent@warp.dev> * feat(missing_docs): rollout-gate CLI/API surfaces via gated:<Flag> Fixes the limitation that the CLI and API audits flagged commands/routes regardless of whether their feature had shipped (so non-GA surfaces like Agent Memory needed a permanent `-> internal`). audit_docs.py: - New `gated:<Flag>` surface-map target for CLI commands and API routes. The audit resolves the gating flag's rollout status (the same machinery used for feature flags + settings): * non-GA (preview/dogfood/other) -> deferred (new `gated_non_ga` accounting bucket), not a finding; * GA -> falls through to normal coverage so it auto-surfaces as a finding; * unknown flag -> conservative (still a finding) + map-hygiene error so the annotation can't silently rot. - audit_cli / audit_api now take flag_statuses; main computes it for API runs. - Completeness accounting keeps totality (gated_non_ga counted; unaccounted none). feature_surface_map.md: migrate Agent Memory's `oz memory*` / `oz memory-store*` CLI and `/memory_stores/*` API from `-> internal` to `-> gated:AIMemories`, so they auto-surface for docs when AIMemories goes GA. Documented the `gated:` sentinel in the header. SKILL.md: document `gated:<Flag>` in Public vs. private surfaces + Resolution patterns. tests: add TestGatedLogic (helper, non-GA deferral, GA auto-surface, unknown-flag conservatism, map-hygiene validation). 27 tests pass; audit exit 0, unaccounted none. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(missing_docs): sync SKILL.md accounting buckets with gated_non_ga The gated:<Flag> work added a `gated_non_ga` bucket to the CLI and API completeness accounting, but SKILL.md's bucket list still omitted it. Document `gated_non_ga` for CLI commands and API routes, and note the `gated:<Flag>` sentinel alongside `internal` in the References section so the documented accounting matches what the audit actually emits. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(cli): resolve missing_docs CLI drift (oz agent, oz run messaging, oz provider) (#279) Demonstrates the missing_docs drift-watch loop end to end on real drift. The audit flagged 15 undocumented `oz` subcommands; this resolves all of them according to each command's GA rollout status: - GA (NamedAgents): document the `oz agent` named-agent management group (list/get/create/update/delete + `oz agent skills`) and fix the existing `oz agent list` entry, which incorrectly described skill listing. - GA (ConversationApi): document `oz run conversation get` and the `oz run message` inbox commands (list/read/watch/send/mark-delivered). - Non-GA (ProviderCommand = dogfood): defer the whole `oz provider` group via `gated:ProviderCommand` so it auto-surfaces for docs when it goes GA. All command flags drafted from crates/warp_cli source (agent.rs, task.rs, provider.rs). CLI audit now reports 0 gaps; cli_commands gated_non_ga = 14. Co-authored-by: Oz <oz-agent@warp.dev> * docs(cli): cross-link run-cloud --agent to named-agent management The new "Managing named agents" section says agents are run with `oz agent run-cloud --agent <UID>`, but that flag was absent from the run-cloud key-flags list. Document `--agent <UID>` (from RunCloudArgs in crates/warp_cli/src/agent.rs) and cross-link the two sections so readers can get from creating a named agent to running one. Co-Authored-By: Oz <oz-agent@warp.dev> --------- Co-authored-by: Oz <oz-agent@warp.dev> Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A live drift-detection demo for the
missing_docsskill (stacked on #201). It runs the skill's drift-watch loop on real, current drift in theozCLI and resolves every finding according to each command's GA rollout status — including the newgated:<Flag>deferral mechanism from #201.This is the "actual drift-detection test" requested: the audit found 15 undocumented CLI subcommands, and this PR closes all of them.
The drift the audit detected
python3 .agents/skills/missing_docs/scripts/audit_docs.py --category cliflagged 15 undocumentedozsubcommands across three command groups:oz agent get/create/update/delete/skillsoz run conversation/conversation get/message(+watch/send/list/read/mark-delivered)oz provider setup/listHow each group was resolved (by rollout status)
The skill defers non-GA surfaces and documents GA ones. Statuses were confirmed via the audit's own
compute_flag_statusesagainstcrates/warp_features:NamedAgents= ga → documented theoz agentnamed-agent management group in the CLI reference, drafted flag-by-flag fromcrates/warp_cli/src/agent.rs. Also fixed an existing bug: the oldoz agent listentry incorrectly described listing skills with--repo(that behavior is actuallyoz agent skills).ConversationApi= ga → documentedoz run conversation getand theoz run messageinbox commands, drafted fromcrates/warp_cli/src/task.rs.ProviderCommand= dogfood (non-GA) → deferred the wholeoz providergroup viagated:ProviderCommandin the surface map. It is intentionally undocumented while non-GA and will auto-surface as a finding onceProviderCommandgoes GA. This is the core demonstration of the gated mechanism on real drift.Validation
audit_docs.py --category cli→ 0 gaps (was 15).accounting:cli_commands.finding: 0,gated_non_ga: 14(11 Agent Memory + 3oz provider),map_hygiene: 0,unaccounted: {}.suggest_reviewers.py) for the documented sources (crates/warp_cli/*) → @bnavetta, @ianhodge.The remaining gaps in the full audit (18 API spec gaps, 29 LOW terminology) are pre-existing and unrelated to this change.
Files changed
src/content/docs/reference/cli/index.mdx— GA command docs +oz agent listbug fix..agents/skills/missing_docs/references/feature_surface_map.md—oz providergroup deferred viagated:ProviderCommand.Conversation: https://staging.warp.dev/conversation/c5ce6a1a-81a4-4a04-92d9-8d587693be12
Run: https://oz.staging.warp.dev/runs/019f19e2-f981-725b-b383-bc7ea01adb6a
Plans:
This PR was generated with Oz.