feat(compile): IR-validated step authoring & runtime self-optimization by jamesadevine · Pull Request #1061 · githubnext/ado-aw

jamesadevine · 2026-06-16T13:15:10Z

feat(compile): IR-validated step authoring & runtime self-optimization

Summary

Two new agent-driven capabilities that don't exist in ado-aw (or upstream gh-aw) today:

Flow A — Authoring-time step hoisting: When an authoring agent creates or updates a pipeline, it proactively scaffolds concrete steps: / post-steps: entries for deterministic, non-LLM work (clone, cache restore, CLI install). A new validate_steps MCP tool on the author-facing server gives the agent IR-level feedback before committing.

Flow B — Runtime self-optimization: Opt-in via self-optimization: enabled: true. The Stage-1 runtime agent recognises bash it ran successfully and proposes lifting it into front-matter steps via a new structured safe-output. Stage 2 cross-checks proposals against command history (anti-injection). Stage 3 IR-validates (Curated allow-list: bash + typed-factory tasks only) and either renders a 🎭 build-log preview (staged mode, the default) or opens a PR against the source .md (live mode, with idempotency dedup).

Self-audit prompt: New prompts/audit-ado-agentic-workflow.md teaches a Copilot CLI agent to systematically audit pipeline runs across 5 dimensions (cost, hoist candidates, reliability, safe-output quality, security) and produce actionable reports with concrete front-matter patches.

What shipped (13 commits)

Layer	What
Shared core	`compile::ir::validate_step_block` structural validator + `CURATED_TASK_IDS` allow-list
Flow A	`validate_steps` MCP tool + 3 authoring prompt updates (hoist heuristic)
Front matter	`self-optimization:` section (enabled, staged, max-proposals-per-run, allowed-sections)
Flow B Stage 1	`propose-step-optimization` safe-output + `OPT_IN_GATED_TOOLS` gating
Flow B Stage 2	Detection prompt extension (command-history cross-check)
Flow B Stage 3	Staged preview renderer + live-PR path (with idempotency dedup)
Telemetry	`AgentStats.propose_step_optimization_calls` OTel counter
Self-audit	`prompts/audit-ado-agentic-workflow.md` + dispatcher routing
Docs	8 doc files updated/created (front-matter, mcp-author, extending, safe-outputs, self-optimization, prompts×4, AGENTS.md)
Tests	2046 total tests pass; ~50 new tests across unit, integration, and fixture coverage

Configuration

self-optimization:
  enabled: true              # opt-in (default: false)
  staged: true               # preview-only (default: true)
  max-proposals-per-run: 3   # cap (default: 3, max: 50)
  allowed-sections:          # which sections proposals may target
    - steps                  # (default: [steps, post-steps])
    - post-steps

Security model

Curated allow-list: Only Bash steps and tasks with a typed factory in tasks.rs are accepted (today: ArchiveFiles@2, CopyFiles@2, DockerInstaller@0, DotNetCoreCLI@2, PublishTestResults@2).
Stage 2 cross-check: Detection agent verifies proposed bash against source_command_evidence — ungrounded proposals are flagged as prompt injection.
Staged-by-default: First runs only preview; authors flip staged: false once they trust the agent's judgement.
Section scoping: setup/teardown require explicit opt-in (different job identities).
OPT_IN_GATED_TOOLS: MCP layer strips the tool unless the compiler explicitly enables it.
Validator rejects safe-outputs.propose-step-optimization: — the tool is ONLY activated via the self-optimization: section.

Testing

cargo build        # 1 warning (allows_section unused — trivial)
cargo test         # 2046 passed, 0 failed, 1 ignored
cargo clippy       # clean

…t matter * Add CURATED_TASK_IDS + is_curated_task() in src/compile/ir/tasks.rs. These will gate which TaskStep variants agent-proposed step blocks may reference (used by the IR fragment validator landing next). A new test asserts the const list stays in lock-step with every factory's emitted task identifier. * Add SelfOptimizationConfig + StepSection in src/compile/types.rs and wire as the new self-optimization: front-matter section (opt-in; default off). Defaults: staged=true, max-proposals-per-run=3, allowed-sections=[steps, post-steps]. The sanitize impl clamps max-proposals-per-run at 50. Seven parsing tests lock the schema. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ocks Add src/compile/ir/step_validation.rs exposing validate_step_block, the shared core for two upcoming layers: * Flow A — a validate_steps MCP tool on the author-facing MCP server so the authoring agent can get IR-aligned feedback on proposed steps: blocks without round-tripping through ado-aw compile. * Flow B — the propose-step-optimization safe-output Stage 3 executor must IR-validate any agent-proposed block before applying it to the source .md. The validator operates on serde_yaml::Value (mirroring how the front-matter steps: field is treated today — opaque YAML passed to ADO) rather than lowering to a typed Vec<Step> first; the IR has no public Value -> Step parser. It enforces: * Top-level must be a sequence; each entry a mapping with exactly one of bash/task/checkout/download/publish. * Unknown step-level keys are rejected (the most common shape- injection vector in untrusted YAML). * env: values must be string scalars (nested maps/sequences could smuggle ADO macros or template expressions). * bash bodies are capped at 10 KB. * task identifiers must match Name@Version. Curated mode also restricts tasks to CURATED_TASK_IDS (the tasks.rs typed factories). * All errors collected, not short-circuited. 17 unit tests cover happy paths, every failure class, and the task identifier parser. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The new tool exposes compile::ir::validate_step_block over MCP so an authoring agent (Copilot Chat / Claude / Codex) can get IR-aligned feedback on a proposed front-matter steps: block without round-tripping through the full ado-aw compile flow. Input: a JSON array of step entries (same shape ADO accepts in YAML) plus an optional allow_list mode ("full" default, "curated" to additionally restrict tasks to the tasks.rs typed- factory set). The MCP transport speaks JSON; the tool round-trips through a YAML text representation before handing to the validator (serde_yaml::Value is a strict superset of serde_json::Value, so the conversion is lossless for JSON-shaped inputs). Output: structured response { ok, kinds } or { ok, errors }. Errors are collected, not short-circuited, so the agent gets the full picture in one round. Adds Serialize derives on StepKind + StepValidationError to allow the structured MCP response and the future propose-step-optimization Stage 3 audit emission. Three new tests cover the registration, the curated-mode rejection of AzureCLI@2, and the invalid_params error on a bad allow_list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…teps * create-ado-agentic-workflow.md: Step 13 (Inline Steps) now leads with a 'hoist candidates' heuristic the authoring agent uses to decide what work belongs in steps:/post-steps: vs the prompt body, with concrete examples and a 'Validate before committing' subsection pointing at the new validate_steps MCP tool. * update-ado-agentic-workflow.md: mirrors the heuristic for edit flows so the update agent re-examines hoist candidates before modifying prompt bodies. * debug-ado-agentic-workflow.md: adds a slow-build diagnostic that surfaces hoist candidates from the audit command history. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs/front-matter.md: new 'Self-optimization (opt-in)' section documenting the self-optimization: block (enabled, staged, max-proposals-per-run, allowed-sections), with the canonical YAML example slotted in alongside execution-context. * docs/mcp-author.md: new validate_steps entry covering input schema, full vs curated allow-list modes, and the structured success/error response shapes. * docs/extending.md: pointer to compile::ir::validate_step_block as the shared structural validator for any component accepting an untrusted step block (always use StepKindAllow::Curated for agent-proposed input). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Stage-1 surface of Flow B for runtime self-optimization. When the agent's front matter sets self-optimization.enabled: true, the Stage-1 agent gets access to a structured safe-output that lets it propose lifting deterministic bash work (clone, install, cache restore, artifact download) out of its prompt body and into the front-matter steps:/post-steps: section. Params (validated at Stage 1): - section: which front-matter section to propose into (steps, post-steps, setup, teardown — kebab-case wire format mirroring StepSection in compile::types). - rationale: <= 2 KB, non-empty. - estimated_token_savings: optional hint. - steps: JSON array of ADO step entries (deep structural validation runs at Stage 3 via compile::ir::validate_step_block in Curated mode — bash + tasks.rs typed factories only). - source_command_evidence: bash commands the agent actually ran, capped at 64 entries / 10 KB each. Stage 2 cross-checks this against the audit command history; any bash in steps without a matching evidence entry is a prompt-injection signal. Stage 3 placeholder: this commit records the proposal in safe_outputs.ndjson for audit visibility but emits no preview or PR. The staged-preview renderer and live-mode PR opener land in subsequent commits. Gating (parallel mechanism to ado-aw-debug's DEBUG_ONLY_TOOLS): - New OPT_IN_GATED_TOOLS const in src/safeoutputs/mod.rs listing propose-step-optimization. - MCP layer in src/mcp.rs strips the route unless explicitly listed in --enabled-tools (the same path used for debug-only tools, generalised from is_debug_only -> is_gated). - New self_optimization_enabled(fm) predicate in compile/common.rs drives whether generate_enabled_tools_args adds the tool to the --enabled-tools list. - New validate_self_optimization_config validator (wired in agentic_pipeline.rs) rejects safe-outputs.propose-step-optimization with a clear "use self-optimization: instead" message, and rejects an empty allowed-sections list when the feature is enabled. Sanitisation: rationale runs through sanitize_config; steps and source_command_evidence pass through unchanged because the Stage 3 IR validator enforces structure and mangling them would break the Stage 2 command-history cross-check. Tests: 10 unit tests in propose_step_optimization (round-trip, section enum, validation rejection paths, dry-run summary, sanitization invariants); 12 new compile-side tests (self_optimization_enabled, generate_enabled_tools_args inclusion/ exclusion, validate_self_optimization_config rejection paths). All 2040 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tion test * src/data/threat-analysis.md: extend the threat-detection prompt with a fourth analysis item (Unsupported Step-Optimization Proposal). The detection agent now cross-checks every propose-step-optimization entry's params.steps against params.source_command_evidence and the agent's actual command history (from the audit MCP-call trail). Any bash in the proposal that does not appear in the agent's observed behaviour is flagged under prompt_injection: true. Setup/teardown proposals get extra scrutiny because they run as separate jobs with potentially different identities. * src/agent_stats.rs: add propose_step_optimization_calls counter to AgentStats so authors can see, at a glance, how often the agent is finding hoist candidates. Defensive substring matching on the OTel span name (the Copilot CLI uses several shapes for MCP-tool spans across versions). All existing AgentStats constructor call sites updated; new test exercises the counter. * tests/fixtures/self-optimization-agent.md + tests/compiler_tests.rs: end-to-end integration test confirming that a self-optimization- enabled fixture compiles cleanly AND emits the --enabled-tools propose-step-optimization flag. Counterpart to the unit tests in compile::common::tests::test_generate_enabled_tools_args_self_optimization_*. All 2042 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The first end-to-end milestone for Flow B: when an agent calls propose-step-optimization and self-optimization.staged is true (the default), Stage 3 now: 1. Reads the self-optimization config from tool_configs (injected by main.rs from the front-matter self-optimization: block). 2. Validates the proposed section against allowed-sections; rejects proposals targeting sections the author hasn't opted in to. 3. IR-validates the proposed step block via validate_step_block with StepKindAllow::Curated (bash + typed-factory tasks only). IR-invalid proposals are rejected with structured error output. 4. Renders a formatted staged preview to the Stage 3 step log, showing section, rationale, token-savings estimate, and the proposed YAML in a copy-paste-ready format. Authors can now opt in to self-optimization, watch the agent's proposals accumulate in their build logs over a few runs, then flip staged: false when they trust the proposals. The live-PR path (stage3-live-pr-path todo) remains unimplemented — calling with staged: false returns a clear failure message pointing at the upcoming release. Wiring changes: - SelfOptimizationConfig gains Serialize (needed for serde_json round-trip into tool_configs). - main.rs::build_execution_context injects the config into tool_configs["propose-step-optimization"] when self-optimization is enabled — parallel mechanism to the ado-aw-debug.create-issue config injection. - execute.rs: registers ProposeStepOptimizationResult budget and dispatch route (new dispatch_opt_in_tools alongside dispatch_debug_tools). All 2042 tests pass (unit + integration). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ication * tests/bash_lint_tests.rs: add self-optimization-agent.md to the shellcheck fixture list. * docs/safe-outputs.md: new Self-modification subsection documenting propose-step-optimization (opt-in activation, Stage 2 cross-check, Stage 3 staged/live behaviour). * src/audit/analyzers/safe_outputs.rs: verified that existing proposed_count logic generically counts propose-step-optimization entries (no exclusion or special-casing needed — all NDJSON entries contribute to proposed_count regardless of tool type). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When self-optimization.staged is false, the executor now: 1. Reads the source .md from ctx.source_directory + the newly-added source_file_relative_path field on ExecutionContext (set in main.rs::run_execute from the --source CLI arg). 2. Patches the front matter via patch_front_matter(): finds the target section (steps/post-steps/setup/teardown), parses the YAML, appends the proposed entries, re-serializes. 3. Pushes a single-file edit commit to a new branch via ADO REST Pushes API (same endpoint and auth pattern as create-pull-request). 4. Opens a PR against the default branch via ADO REST Pull Requests API, with a structured body explaining the rationale, section, token savings, and that the proposal passed IR validation + Stage 2. Branch naming: ado-aw/self-opt-<section>-<random-hex>. Commit message: chore(ado-aw): self-optimize `<section>` steps. Also adds: - ExecutionContext.source_file_relative_path field (Option<String>) populated in main.rs from strip_prefix(source_directory) on the --source path. - patch_front_matter() helper with 3 unit tests (insert into existing section, create absent section, reject missing fences). - Fixes to two test files that needed the new field initialised. All 2044 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…e variants * Idempotency: before opening a new PR in live mode, the executor now queries ADO for existing open PRs whose sourceRefName starts with `refs/heads/ado-aw/self-opt-<section>-`. If one exists, it returns success with a message pointing the author at the existing PR instead of opening a duplicate. * New integration test (test_self_optimization_staged_false_compiles_identically) confirming that both staged: true and staged: false front-matter variants compile to the same pipeline YAML (staging is a Stage 3 runtime decision, not a compile-time fork). Uses a new fixture (self-optimization-live-agent.md) with staged: false and extended allowed-sections: [steps, post-steps, setup]. All 2046 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs/self-optimization.md covers the opt-in self-optimization feature end-to-end: configuration, three-stage flow, staged/live mode, allowed step kinds, security model, hoist heuristic, troubleshooting. AGENTS.md: add the new page to the Documentation Index. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-16T13:25:17Z

🔍 Rust PR Review

Summary: Has a few real bugs and one security gap worth fixing before merge.

Findings

🐛 Bugs / Logic Issues

1. checkout, download, publish accepted in Curated mode — contradicts stated security model

src/compile/ir/step_validation.rs — classify_and_validate_step

The PR description and docs/self-optimization.md explicitly state the curated allow-list is "bash + typed-factory tasks only", but checkout, download, and publish steps pass StepKindAllow::Curated validation without restriction. The test accepts_multi_entry_mixed_step_block confirms this by calling validate_step_block(..., StepKindAllow::Curated) with a publish: entry and asserting success.

checkout: self with persistCredentials: true is particularly concerning in this mode — an agent could propose it to leave credentials available to subsequent steps. publish: could exfiltrate data as a build artifact.

Fix: add a StepKindAllow::Curated arm to classify_and_validate_step that rejects checkout, download, and publish with a clear error message pointing authors to StepKindAllow::Full (the authoring-time tool).

2. max_proposals_per_run is configured but never enforced at Stage 3

src/safeoutputs/propose_step_optimization.rs — execute_impl

SelfOptimizationConfig::max_proposals_per_run is validated, clamped at 50 in sanitize_config_fields, and documented in the front-matter grammar. But execute_impl never reads this field. Stage 3 processes every NDJSON entry independently, so a misbehaving agent could emit 100+ proposals regardless of the configured cap.

Stage 3 needs to count how many times this tool has already executed in the current run (e.g., by recording processed proposals in a sidecar file or checking the NDJSON archive) and return ExecutionResult::failure when the cap is exceeded.

3. patch_front_matter can split incorrectly on multiline YAML strings

src/safeoutputs/propose_step_optimization.rs:644

let Some(fence_end) = after_first_fence.find("\n---") else { ... };

find("\n---") returns the first occurrence of \n--- in the post-opening-fence content. If the front matter contains a multiline string value that includes --- on its own line — e.g.:

description: |
  ---
  some details here

...the split will land inside the YAML block, producing a truncated yaml_str and a corrupted body_with_fence. The resulting re-serialized file will be silently malformed. Change the search to "\n---\n" (or "\n---\r\n" for CRLF), which matches only a bare fence separator and won't fire on inline --- content.

🔒 Security Concerns

4. env: values in proposed steps not checked for ADO macro expansion syntax

src/compile/ir/step_validation.rs — check_env_string_values

The validator correctly rejects non-string env values, but it does not inspect whether string values contain ADO macro syntax ($(...)) or template expressions ($[...]). Stage 2 cross-checks bash bodies against source_command_evidence, but env values are outside that check. A proposal like:

- bash: ls -la   # agent did run this → evidence matches
  env:
    ADO_TOKEN: $(System.AccessToken)

would pass the IR validator, pass Stage 2 (bash body is in evidence), and — if merged — would expand $(System.AccessToken) in a future build and make the token available inside $ADO_TOKEN during the step. Consider rejecting values that match \$\( or \$\[ at the validator level as a defence-in-depth measure, consistent with the stated security model.

⚠️ Suggestions

5. Section comparison uses serde_yaml::to_string(s).trim() — fragile round-trip

src/safeoutputs/propose_step_optimization.rs:243–248

let section_allowed = config.allowed_sections.iter().any(|s| {
    serde_yaml::to_string(s)
        .unwrap_or_default()
        .trim()
        == section_wire
});

Comparing via a YAML serialization round-trip when a direct string conversion is already available elsewhere in this file (ProposalSection::as_wire_str) is unnecessarily fragile. Add as_wire_str() -> &'static str to StepSection (same approach as ProposalSection) and use that for the comparison. This eliminates the allocation and the implicit dependency on serde_yaml's enum formatting.

6. patch_front_matter silently strips YAML comments on re-serialisation

src/safeoutputs/propose_step_optimization.rs:651–679

Parsing the front matter into a serde_yaml::Value and re-serializing with serde_yaml::to_string will silently discard any # comments in the YAML block and may reorder keys. The live-mode PR will therefore contain spurious formatting changes beyond the intended step additions, making review noisier and potentially surprising authors who maintain documentation comments in their front matter. Worth calling out in the live-mode PR body ("Note: YAML comments in the front-matter block are not preserved across this rewrite") or in the docs.

✅ What Looks Good

The OPT_IN_GATED_TOOLS gating mechanism + validate_self_optimization_config rejecting safe-outputs.propose-step-optimization is a clean defence-in-depth layer.
Staged-by-default is the right choice — authors opt into real mutations explicitly.
validate_step_block collects all errors rather than short-circuiting; the comprehensive error message UX is good.
is_valid_task_identifier is simple, regex-free, and tests the edge cases well.
The source_command_evidence cross-check design for Stage 2 is sound.
SelfOptimizationConfig::sanitize_config_fields clamping max_proposals_per_run at 50 is the right call.

Generated by Rust PR Reviewer for issue #1061 · 367.8 AIC · ⌖ 12.7 AIC · ⊞ 33.7K · ◷

New prompt file teaching a Copilot CLI agent how to systematically audit pipeline runs using the audit_build, logs, status, and inspect_workflow MCP tools. Covers five analysis dimensions: 1. Cost & token efficiency (token trends, model choice, turns/output) 2. Hoist candidates (self-optimization opportunities from tool_usage) 3. Reliability & failure patterns (errors, timeouts, network blocks) 4. Safe-output quality (acceptance rate, noops, detection rejections) 5. Security posture (detection flags, firewall anomalies, MCP health) Produces a structured markdown report with prioritized findings, concrete front-matter change recommendations (including enabling self-optimization), and optionally applies fixes via validate_steps + the update workflow. Also updates: - AGENTS.md: architecture tree + documentation index entry - src/data/init-agent.md: dispatcher routing for "audit my pipeline" requests + use-case examples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-16T13:54:17Z

🔍 Rust PR Review

Summary: Two findings need attention before merge — one security bug and one enforcement gap. The overall architecture and test coverage are strong.

Findings

🔒 Security Concerns

propose_step_optimization.rs:990-1011 — ##vso[ pipeline-command injection via staged preview stdout

In staged mode, proposed_yaml is printed directly to stdout without calling neutralize_pipeline_commands:
```
// propose_step_optimization.rs ~line 990
let proposed_yaml = serde_yaml::to_string(&yaml_value)
    .unwrap_or_else(|_| "<failed to render YAML>".to_string());
// ...
println!("{preview}");   // ← preview embeds proposed_yaml verbatim
```
The SanitizeContent impl explicitly skips steps (justified for Stage 2 cross-check integrity), and validate_step_block checks structural shape and byte limits but does not strip ##vso[task.setvariable variable=X]Y sequences. Since Stage 3 runs as a regular ADO task, the ADO build agent processes ##vso[...] written to stdout as pipeline commands. A compromised agent could embed ##vso[task.setvariable variable=EXPLOIT]$(System.AccessToken) in a bash body (valid string ≤ 10KB), pass IR validation, and use Stage 3's staged preview to inject pipeline commands.

The other println!/stdout paths in execute.rs correctly route through log_and_print_entry_result which calls neutralize_pipeline_commands(&result.message), but this println!("{preview}") call inside execute_impl bypasses that wrapper entirely.

Fix: apply neutralize_pipeline_commands to proposed_yaml before embedding it in the preview string:
```
use crate::sanitize::neutralize_pipeline_commands;
let proposed_yaml_safe = neutralize_pipeline_commands(&proposed_yaml);
// use proposed_yaml_safe in preview format string
```

🐛 Bugs / Logic Issues

propose_step_optimization.rs:920-1027 — max_proposals_per_run declared but never enforced in Stage 3

execute_impl reads config.max_proposals_per_run from the tool config but never uses it:
```
let config: crate::compile::types::SelfOptimizationConfig =
    ctx.get_tool_config("propose-step-optimization");

// config.staged is used; config.allowed_sections is used;
// config.max_proposals_per_run is ... never checked.
```
execute_safe_outputs in execute.rs processes every NDJSON entry sequentially, dispatching each propose-step-optimization entry into a fresh execute_impl call with no shared counter. In staged mode this means unlimited previews. In live mode, the per-section idempotency check caps to one PR per section (max 4), but that bound is an incidental side-effect of idempotency, not enforcement of the documented max_proposals_per_run cap.

The config promise in the docs and front-matter type should be enforced. One pattern used elsewhere: track counts in ExecutionContext via an Arc<Mutex<u32>> counter keyed on the tool name (similar to uploaded_pipeline_artifact_keys).

⚠️ Suggestions

propose_step_optimization.rs:925-930 — allows_section comparison via serde_yaml::to_string roundtrip
```
let section_allowed = config.allowed_sections.iter().any(|s| {
    serde_yaml::to_string(s)
        .unwrap_or_default()
        .trim()
        == section_wire
});
```
This serializes each StepSection enum variant to YAML and string-compares. It works correctly today, but it's fragile — it relies on serde_yaml's enum-unit-variant format staying stable and matching the kebab-case wire strings that ProposalSection::as_wire_str() returns. StepSection already has #[serde(rename_all = "kebab-case")] so the values align, but the indirection is unnecessary. Adding fn as_wire_str(self) -> &'static str to StepSection (mirroring the identical method on ProposalSection) would let this be a direct == comparison and remove any serialization dependency from a security-critical allow-list check.
ir/mod.rs — stale #[allow(unused_imports)] comment

The #[allow(unused_imports)] block says the downstream consumers "land in subsequent commits," but both consumers (validate_steps in mcp_author/mod.rs and propose-step-optimization Stage 3 in safeoutputs/propose_step_optimization.rs) are in this PR. The attribute is harmless since the items are used, but the comment is misleading. The #[allow] can be removed.

✅ What Looks Good

The curated task allow-list design (CURATED_TASK_IDS + is_curated_task + the lockstep test curated_task_ids_match_factories) is exactly the right pattern — the allow-list is the source of truth and the test enforces it stays in sync with the factories.
validate_step_block collecting all errors rather than short-circuiting is the right UX call for a validator that surfaces feedback to agents.
The OPT_IN_GATED_TOOLS / DEBUG_ONLY_TOOLS distinction in the MCP filter is a clean security boundary that makes the gating explicit and extensible.
Per-section idempotency check in live mode (prefix branch search before creating a new branch) is a thoughtful operator-friendliness touch.
Test coverage is comprehensive — the structural validator tests, section roundtrip tests, and patch_front_matter tests all cover the key failure modes.

Generated by Rust PR Reviewer for issue #1061 · 447.2 AIC · ⌖ 12.6 AIC · ⊞ 33.8K · ◷

jamesadevine and others added 12 commits June 15, 2026 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compile): IR-validated step authoring & runtime self-optimization#1061

feat(compile): IR-validated step authoring & runtime self-optimization#1061
jamesadevine wants to merge 13 commits into
mainfrom
jamesadevine/self-optimization-and-step-validation

jamesadevine commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesadevine commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What shipped (13 commits)

Configuration

Security model

Testing

Uh oh!

github-actions Bot commented Jun 16, 2026

🔍 Rust PR Review

Findings

🐛 Bugs / Logic Issues

🔒 Security Concerns

⚠️ Suggestions

✅ What Looks Good

Uh oh!

github-actions Bot commented Jun 16, 2026

🔍 Rust PR Review

Findings

🔒 Security Concerns

🐛 Bugs / Logic Issues

⚠️ Suggestions

✅ What Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamesadevine commented Jun 16, 2026 •

edited

Loading