Switch pod-scheduler configs from JSON to YAML; add JSON Schema by LoopedBard3 · Pull Request #3 · LoopedBard3/Benchmarks

LoopedBard3 · 2026-05-07T20:16:12Z

Why

Follow-up to aspnet#2167. The pod-scheduler configs introduced there are JSON, which makes them awkward to read and edit — there are no comments, lots of quote/comma noise, and type: 3 is a magic integer you have to grep models.py to understand. This is also rough for LLM-driven edits, which tend to revert to literal JSON and don't have anywhere to anchor docs.

This PR switches the configs to YAML, ships a JSON Schema, and makes scenario.type a self-documenting string.

What

Format

Move build/benchmarks_ci{,_azure,_cobalt}_pods.json → .yml. Each opens with:
```
# yaml-language-server: $schema=../scripts/pod-scheduler/pod-config.schema.json
```
so VS Code / Cursor / the Red Hat YAML extension provide autocomplete, hover docs, and inline validation while editing.
Inline comments now document things that used to live only in PR review threads — e.g., that gold-lin and gold-win share gold-db, or that the -28 cobalt-hosted pods reuse the same physical machines as their non-28 siblings.

Self-documenting `type`

type: 1|2|3 → type: single|dual|triple (case-insensitive). Integer form is still accepted so any local .json copies keep loading; bools (the YAML yes/no trap) and unknown strings raise ConfigError.

Schema

New scripts/pod-scheduler/pod-config.schema.json (Draft 2020-12). Schema descriptions double as grounding for LLM agents — they should hallucinate fewer keys and wrong type values.
Schema and configs validated locally with the jsonschema package.

Loader

config_loader.py dispatches on file extension: .yml/.yaml via PyYAML, .json via stdlib (back-compat), unknown extensions → ConfigError.

Generated pipelines

benchmarks-ci-01.yml, -02.yml, -azure.yml, -cobalt.yml regenerated. Only the embedded regen-command header line changed (.json → .yml); the snapshot test confirms scheduling output is byte-identical.

Tests

49 unit tests pass. 5 new cases:
- YAML happy path (default fixture writer is now YAML, so every existing test exercises the YAML path).
- All three string aliases for type with case variations.
- Integer back-compat for type.
- Bool and unknown-string type rejection.
- .json back-compat happy path.
- Unknown-extension rejection.

Docs

Repo README.md, build/README.md, and scripts/pod-scheduler/README.md updated with YAML examples and a section on the schema directive.

Verification

cd scripts/pod-scheduler
python -m unittest discover tests
# Ran 49 tests in 0.344s — OK

The snapshot test enforces that regenerating from the new YAML configs produces byte-identical pipeline YAML, so this is a config/format refactor with no behavioral change.

Notes for reviewers

I kept the _format_source_path repo-relative URL behavior; it now expects .yml paths in the test, which matches what main.py is invoked with from CI.
_parse_scenario_type explicitly rejects bool because yaml.safe_load happily turns yes/no/true/false into Python bools, which would otherwise quietly resolve to 1/0 via the integer alias. Better to fail loudly.
Schema asserts additionalProperties: false on pods/scenarios so a misspelled key (e.g. mahcines: instead of machines:) is caught at edit-time by the YAML LSP.

The cobalt cloud machines were moved to a new Azure region. Updated all VNet IP addresses in build/azure.profile.yml: - cobalt-server-lin: 10.2.2.15 -> 10.0.4.17 - cobalt-client-lin: 10.2.2.13 -> 10.0.4.18 - cobalt-db-lin: 10.2.2.14 -> 10.0.4.19 - cobalt-server-lin-azure3: 10.2.2.16 -> 10.0.4.20 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Merge cobalt cloud machines into benchmarks_ci_azure.json with machine_group support - Regenerate benchmarks-ci-azure.yml using crank-scheduler (14 groups, handles machine conflicts) - Update benchmarks.template.liquid header with scheduler instructions - Remove separate eastus2 pipeline files (benchmarks-ci-azure-eastus2.yml, benchmarks.matrix.azure.eastus2.yml, benchmarks_ci_azure_eastus2.json) - Remove benchmarks.matrix.azure.yml (replaced by JSON + scheduler approach) - Remove cobaltcloud service bus queue (cobalt jobs now use azure/azurearm64 queues) - Remove EAST US 2 MACHINES header from azure.profile.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…pnet#2167) * Add pod-based crank scheduler prototype Simplified alternative to PR aspnet#2106's full crank-scheduler. Uses a pod model where machines are fixed groups (SUT + load + DB) instead of individual machines with capability scoring and preferred partners. Key simplifications: - Pods define fixed machine groupings (no role priority/scoring) - Shared machines between pods handled via collision detection - Same greedy longest-job-first bin-packing algorithm - Same Liquid template YAML generation - ~570 lines vs ~2000 lines in the full scheduler Includes: - scripts/pod-scheduler/ (5 Python files + README) - build/benchmarks_ci_pods.json (pod-based config for CI benchmarks) * Add azure, azure-eastus2, and cobalt pod configs Pod-based configurations for all three additional CI environments: - benchmarks_ci_azure_pods.json: 6 pods, 14 runs (matches main) - benchmarks_ci_azure_eastus2_pods.json: 2 pods, 12 runs (matches main) - benchmarks_ci_cobalt_pods.json: 4 pods, 44 runs (matches main) Notable pod patterns: - Azure IDNA pods cross-use each other as load machines - Cobalt hosted has 28-core variant pods sharing physical machines with full-core pods (handled by collision detection) - Azure eastus2 pods share load/db, serialized automatically Also fixes unicode bar chars for Windows compatibility. * Update azure pod config: merge eastus2, keep IDNA on linux loads Reflects main branch changes from PR aspnet#2166: - Merged cobalt-cloud-lin pods (eastus2) into azure config - Removed separate benchmarks_ci_azure_eastus2_pods.json - Kept IDNA pod load profiles on linux machines (load jobs require linux), reverting the main branch profile change - Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios (uses cobalt-cloud-lin-db as load instead of client) - Total runs: 26 (matches main azure pipeline) * Regenerate pipeline YAMLs from pod-scheduler configs Generated via: python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_azure_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-azure python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_cobalt_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-cobalt * Cap timeoutInMinutes at 240 (max 2x old 120 default) Formula is now max(120, min(240, 2 * estimated_runtime)). This prevents scenarios with long runtimes (e.g. Proxies at 150min) from setting unreasonably high timeouts compared to previous values. Resulting timeouts: 120 (default), 140 (Grpc), 180 (PGO/Containers), 240 (Proxies) * Address review feedback - Fix 4 incorrect template filenames in benchmarks_ci_pods.json: crossgen-scenarios -> crossgen2-scenarios, custom-proxies-scenarios -> proxies-custom-scenarios, single-file-scenarios -> singlefile-scenarios, websockets-scenarios -> websocket-scenarios - Fix machine utilization calculation bug (was inflating totals for machines not in current stage) - Remove unused imports (sys, Any, Dict, json, Pod) - Remove dead render_with_liquid function and --template CLI arg - Add guard against empty queues (ZeroDivisionError) - Update README and docstrings to reflect removed template arg Code: - Validate cron schedules at load time and raise on unsupported hour fields instead of silently no-op'ing the offset for split YAMLs - Add optional 'timeout' override per scenario; fall back to the runtime-derived formula when absent - Move pipeline plumbing (pool, service-bus connection/namespace) into JSON metadata.pipeline with the previous hardcoded values as defaults - Strict validation of duplicate pods, duplicate scenario.pods entries, empty queues; default scheduler to fail-fast on unknown/invalid pod references with a --lenient opt-out - Stricter job-id sanitization (handles '.', '/', parens, leading digits, unicode) and explicit duplicate detection in generated YAML - Replace id(stage) bookkeeping in split_schedule with explicit indices; add stable name tie-breaker to create_schedule for deterministic output - Use Run.job_name in the generator instead of duplicating the regex - Drop stale '--template' arg from generated YAML headers and README Tests: - 41 unit + snapshot tests covering models, config loader, scheduler, generator, and YAML parity with the committed *_pods.json configs Cleanup: - Revert benchmarks.template.liquid and benchmarks_ci_azure.json to main; the deleted crank-scheduler does not consume them - Regenerate all four pipeline YAMLs against the new generator * Remove unused benchmarks.template.liquid The Liquid template was only consumed by the deleted crank-scheduler. The pod-scheduler renders pipeline YAML directly via Python, and grep confirms no other script, pipeline, or build step reads this file. * Remove orphaned benchmarks.yml and benchmarks.matrix.0[12].yml These were artifacts of the old hand-driven matrix.yml -> json -> Liquid template -> benchmarks.yml workflow. Their only inbound references were stale documentation comments cross-pointing between each other; nothing in the repo (no script, no pipeline) consumed them. * Document pod-scheduler flow across READMEs and YAML headers - Generated YAML headers now embed the exact regen command (with the source config and base name) and a pointer to scripts/pod-scheduler/README.md, so each file documents how to reproduce itself - New build/README.md maps each *_pods.json config to the YAML it produces, lists the hand-maintained scenario templates, and explains the typical edit/regenerate workflow - Top-level README.md gains a 'Continuous benchmarking pipelines' section linking to the pod-scheduler and build/ docs - pod-scheduler README's Quick Start now uses repo-root-relative commands and points at the snapshot tests for verification - Tests cover the new _format_source_path helper and the snapshot test passes the source config so headers stay verified * Remove orphaned crank-scheduler JSON configs benchmarks_ci.json, benchmarks_ci_azure.json, and benchmarks_ci_cobalt.json used the old 'machines + capabilities' format consumed by the deleted crank-scheduler. Their replacements (benchmarks_ci_pods.json, benchmarks_ci_azure_pods.json, benchmarks_ci_cobalt_pods.json) drive the pod-scheduler. grep finds zero inbound references for any of the three across scripts, pipelines, docs, and tests. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Parker Bibus <parker.bibus@microsoft.com>

…#2168) After the move to the VMR (dotnet/dotnet), the .NET runtime and ASP.NET runtime are built from dotnet/dotnet, and the commit hashes Crank captures are dotnet/dotnet commits. The Crank agent still hard-codes the old component-repo URLs (dotnet/aspnetcore, dotnet/runtime) for the synthetic Microsoft.AspNetCore.App and Microsoft.NETCore.App framework dependencies, so the rendered compare URLs in regression issues 404 (e.g. issue dotnet/aspnetcore#66568). Rewrite the URL at template-render time inside the Changes block of all five issue-body templates (rps, published-size, start-time in regressions.config.yml; download-size, first-ui in regressions.blazor.config.yml). The rewrite triggers when the dependency name is one of the two synthetic framework names, or when the stored RepositoryUrl is one of the legacy component-repo URLs. Other dependencies (e.g. application-owned assemblies like Antiforgery.dll, or already-correct dotnet/dotnet entries from post-VMR assembly metadata) are left untouched. Verified by parsing both YAML configs through Fluid and rendering each template against synthetic regressions covering all five diff cases: 0 leftover dotnet/aspnetcore or dotnet/runtime URLs, preserved aspnet/Benchmarks URL on application assemblies, framework deps correctly attributed to dotnet/dotnet. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The three pod configs in build/ were JSON, which made them hard to read and didn't allow inline comments. This converts them to YAML, ships a JSON Schema describing the shape, and changes scenario.type from a magic 1/2/3 integer to a self-documenting single/dual/triple string. * build/benchmarks_ci_pods.yml, benchmarks_ci_azure_pods.yml, and benchmarks_ci_cobalt_pods.yml replace their .json predecessors. Each opens with a yaml-language-server # yaml-language-server: schema=... directive so VS Code / Cursor / the YAML LSP provide autocomplete, hover docs, and inline validation while editing. * scripts/pod-scheduler/pod-config.schema.json (new) is the schema. The schema descriptions also serve as reliable grounding for LLM-driven edits. * config_loader.py now dispatches on file extension and accepts both YAML and JSON. scenario.type accepts single|dual|triple (case-insensitive) plus the legacy integer 1|2|3 for back-compat; bools (which YAML parsers happily produce from yes/no) are rejected explicitly. Unknown extensions and unknown type strings raise ConfigError so typos can't silently drop scenarios. * Generated benchmarks-ci-*.yml files only change the embedded regen command in the file header (.json -> .yml). Schedule data is byte-identical, verified by the existing snapshot test. * Tests: 5 new cases covering YAML loading, type-string aliases, type back-compat, bool/invalid rejection, and unknown-extension rejection. Default fixture writer now produces YAML so the YAML path is exercised on every test. * Docs: README.md, build/README.md, and scripts/pod-scheduler/README.md are updated with YAML examples and the schema-directive convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

LoopedBard3 · 2026-05-07T20:17:17Z

Re-opening against aspnet/Benchmarks (correct upstream target).

LoopedBard3 and others added 5 commits April 21, 2026 09:38

LoopedBard3 closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch pod-scheduler configs from JSON to YAML; add JSON Schema#3

Switch pod-scheduler configs from JSON to YAML; add JSON Schema#3
LoopedBard3 wants to merge 5 commits into
mainfrom
loopedbard3/pod-config-format-options

LoopedBard3 commented May 7, 2026

Uh oh!

LoopedBard3 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LoopedBard3 commented May 7, 2026

Why

What

Format

Self-documenting type

Schema

Loader

Generated pipelines

Tests

Docs

Verification

Notes for reviewers

Uh oh!

LoopedBard3 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Self-documenting `type`