Switch pod-scheduler configs from JSON to YAML; add JSON Schema#3
Closed
LoopedBard3 wants to merge 5 commits into
Closed
Switch pod-scheduler configs from JSON to YAML; add JSON Schema#3LoopedBard3 wants to merge 5 commits into
LoopedBard3 wants to merge 5 commits into
Conversation
The cobalt cloud machines were moved to a new Azure region. Updated all VNet IP addresses in build/azure.profile.yml: - cobalt-server-lin: 10.2.2.15 -> 10.0.4.17 - cobalt-client-lin: 10.2.2.13 -> 10.0.4.18 - cobalt-db-lin: 10.2.2.14 -> 10.0.4.19 - cobalt-server-lin-azure3: 10.2.2.16 -> 10.0.4.20 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Merge cobalt cloud machines into benchmarks_ci_azure.json with machine_group support - Regenerate benchmarks-ci-azure.yml using crank-scheduler (14 groups, handles machine conflicts) - Update benchmarks.template.liquid header with scheduler instructions - Remove separate eastus2 pipeline files (benchmarks-ci-azure-eastus2.yml, benchmarks.matrix.azure.eastus2.yml, benchmarks_ci_azure_eastus2.json) - Remove benchmarks.matrix.azure.yml (replaced by JSON + scheduler approach) - Remove cobaltcloud service bus queue (cobalt jobs now use azure/azurearm64 queues) - Remove EAST US 2 MACHINES header from azure.profile.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pnet#2167) * Add pod-based crank scheduler prototype Simplified alternative to PR aspnet#2106's full crank-scheduler. Uses a pod model where machines are fixed groups (SUT + load + DB) instead of individual machines with capability scoring and preferred partners. Key simplifications: - Pods define fixed machine groupings (no role priority/scoring) - Shared machines between pods handled via collision detection - Same greedy longest-job-first bin-packing algorithm - Same Liquid template YAML generation - ~570 lines vs ~2000 lines in the full scheduler Includes: - scripts/pod-scheduler/ (5 Python files + README) - build/benchmarks_ci_pods.json (pod-based config for CI benchmarks) * Add azure, azure-eastus2, and cobalt pod configs Pod-based configurations for all three additional CI environments: - benchmarks_ci_azure_pods.json: 6 pods, 14 runs (matches main) - benchmarks_ci_azure_eastus2_pods.json: 2 pods, 12 runs (matches main) - benchmarks_ci_cobalt_pods.json: 4 pods, 44 runs (matches main) Notable pod patterns: - Azure IDNA pods cross-use each other as load machines - Cobalt hosted has 28-core variant pods sharing physical machines with full-core pods (handled by collision detection) - Azure eastus2 pods share load/db, serialized automatically Also fixes unicode bar chars for Windows compatibility. * Update azure pod config: merge eastus2, keep IDNA on linux loads Reflects main branch changes from PR aspnet#2166: - Merged cobalt-cloud-lin pods (eastus2) into azure config - Removed separate benchmarks_ci_azure_eastus2_pods.json - Kept IDNA pod load profiles on linux machines (load jobs require linux), reverting the main branch profile change - Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios (uses cobalt-cloud-lin-db as load instead of client) - Total runs: 26 (matches main azure pipeline) * Regenerate pipeline YAMLs from pod-scheduler configs Generated via: python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_azure_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-azure python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_cobalt_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-cobalt * Cap timeoutInMinutes at 240 (max 2x old 120 default) Formula is now max(120, min(240, 2 * estimated_runtime)). This prevents scenarios with long runtimes (e.g. Proxies at 150min) from setting unreasonably high timeouts compared to previous values. Resulting timeouts: 120 (default), 140 (Grpc), 180 (PGO/Containers), 240 (Proxies) * Address review feedback - Fix 4 incorrect template filenames in benchmarks_ci_pods.json: crossgen-scenarios -> crossgen2-scenarios, custom-proxies-scenarios -> proxies-custom-scenarios, single-file-scenarios -> singlefile-scenarios, websockets-scenarios -> websocket-scenarios - Fix machine utilization calculation bug (was inflating totals for machines not in current stage) - Remove unused imports (sys, Any, Dict, json, Pod) - Remove dead render_with_liquid function and --template CLI arg - Add guard against empty queues (ZeroDivisionError) - Update README and docstrings to reflect removed template arg Code: - Validate cron schedules at load time and raise on unsupported hour fields instead of silently no-op'ing the offset for split YAMLs - Add optional 'timeout' override per scenario; fall back to the runtime-derived formula when absent - Move pipeline plumbing (pool, service-bus connection/namespace) into JSON metadata.pipeline with the previous hardcoded values as defaults - Strict validation of duplicate pods, duplicate scenario.pods entries, empty queues; default scheduler to fail-fast on unknown/invalid pod references with a --lenient opt-out - Stricter job-id sanitization (handles '.', '/', parens, leading digits, unicode) and explicit duplicate detection in generated YAML - Replace id(stage) bookkeeping in split_schedule with explicit indices; add stable name tie-breaker to create_schedule for deterministic output - Use Run.job_name in the generator instead of duplicating the regex - Drop stale '--template' arg from generated YAML headers and README Tests: - 41 unit + snapshot tests covering models, config loader, scheduler, generator, and YAML parity with the committed *_pods.json configs Cleanup: - Revert benchmarks.template.liquid and benchmarks_ci_azure.json to main; the deleted crank-scheduler does not consume them - Regenerate all four pipeline YAMLs against the new generator * Remove unused benchmarks.template.liquid The Liquid template was only consumed by the deleted crank-scheduler. The pod-scheduler renders pipeline YAML directly via Python, and grep confirms no other script, pipeline, or build step reads this file. * Remove orphaned benchmarks.yml and benchmarks.matrix.0[12].yml These were artifacts of the old hand-driven matrix.yml -> json -> Liquid template -> benchmarks.yml workflow. Their only inbound references were stale documentation comments cross-pointing between each other; nothing in the repo (no script, no pipeline) consumed them. * Document pod-scheduler flow across READMEs and YAML headers - Generated YAML headers now embed the exact regen command (with the source config and base name) and a pointer to scripts/pod-scheduler/README.md, so each file documents how to reproduce itself - New build/README.md maps each *_pods.json config to the YAML it produces, lists the hand-maintained scenario templates, and explains the typical edit/regenerate workflow - Top-level README.md gains a 'Continuous benchmarking pipelines' section linking to the pod-scheduler and build/ docs - pod-scheduler README's Quick Start now uses repo-root-relative commands and points at the snapshot tests for verification - Tests cover the new _format_source_path helper and the snapshot test passes the source config so headers stay verified * Remove orphaned crank-scheduler JSON configs benchmarks_ci.json, benchmarks_ci_azure.json, and benchmarks_ci_cobalt.json used the old 'machines + capabilities' format consumed by the deleted crank-scheduler. Their replacements (benchmarks_ci_pods.json, benchmarks_ci_azure_pods.json, benchmarks_ci_cobalt_pods.json) drive the pod-scheduler. grep finds zero inbound references for any of the three across scripts, pipelines, docs, and tests. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Parker Bibus <parker.bibus@microsoft.com>
…#2168) After the move to the VMR (dotnet/dotnet), the .NET runtime and ASP.NET runtime are built from dotnet/dotnet, and the commit hashes Crank captures are dotnet/dotnet commits. The Crank agent still hard-codes the old component-repo URLs (dotnet/aspnetcore, dotnet/runtime) for the synthetic Microsoft.AspNetCore.App and Microsoft.NETCore.App framework dependencies, so the rendered compare URLs in regression issues 404 (e.g. issue dotnet/aspnetcore#66568). Rewrite the URL at template-render time inside the Changes block of all five issue-body templates (rps, published-size, start-time in regressions.config.yml; download-size, first-ui in regressions.blazor.config.yml). The rewrite triggers when the dependency name is one of the two synthetic framework names, or when the stored RepositoryUrl is one of the legacy component-repo URLs. Other dependencies (e.g. application-owned assemblies like Antiforgery.dll, or already-correct dotnet/dotnet entries from post-VMR assembly metadata) are left untouched. Verified by parsing both YAML configs through Fluid and rendering each template against synthetic regressions covering all five diff cases: 0 leftover dotnet/aspnetcore or dotnet/runtime URLs, preserved aspnet/Benchmarks URL on application assemblies, framework deps correctly attributed to dotnet/dotnet. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The three pod configs in build/ were JSON, which made them hard to read and didn't allow inline comments. This converts them to YAML, ships a JSON Schema describing the shape, and changes scenario.type from a magic 1/2/3 integer to a self-documenting single/dual/triple string. * build/benchmarks_ci_pods.yml, benchmarks_ci_azure_pods.yml, and benchmarks_ci_cobalt_pods.yml replace their .json predecessors. Each opens with a yaml-language-server # yaml-language-server: schema=... directive so VS Code / Cursor / the YAML LSP provide autocomplete, hover docs, and inline validation while editing. * scripts/pod-scheduler/pod-config.schema.json (new) is the schema. The schema descriptions also serve as reliable grounding for LLM-driven edits. * config_loader.py now dispatches on file extension and accepts both YAML and JSON. scenario.type accepts single|dual|triple (case-insensitive) plus the legacy integer 1|2|3 for back-compat; bools (which YAML parsers happily produce from yes/no) are rejected explicitly. Unknown extensions and unknown type strings raise ConfigError so typos can't silently drop scenarios. * Generated benchmarks-ci-*.yml files only change the embedded regen command in the file header (.json -> .yml). Schedule data is byte-identical, verified by the existing snapshot test. * Tests: 5 new cases covering YAML loading, type-string aliases, type back-compat, bool/invalid rejection, and unknown-extension rejection. Default fixture writer now produces YAML so the YAML path is exercised on every test. * Docs: README.md, build/README.md, and scripts/pod-scheduler/README.md are updated with YAML examples and the schema-directive convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Owner
Author
|
Re-opening against aspnet/Benchmarks (correct upstream target). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Follow-up to aspnet#2167. The pod-scheduler configs introduced there are JSON, which makes them awkward to read and edit — there are no comments, lots of quote/comma noise, and
type: 3is a magic integer you have to grepmodels.pyto understand. This is also rough for LLM-driven edits, which tend to revert to literal JSON and don't have anywhere to anchor docs.This PR switches the configs to YAML, ships a JSON Schema, and makes
scenario.typea self-documenting string.What
Format
build/benchmarks_ci{,_azure,_cobalt}_pods.json→.yml. Each opens with:# yaml-language-server: $schema=../scripts/pod-scheduler/pod-config.schema.jsongold-linandgold-winsharegold-db, or that the-28cobalt-hosted pods reuse the same physical machines as their non-28 siblings.Self-documenting
typetype: 1|2|3→type: single|dual|triple(case-insensitive). Integer form is still accepted so any local.jsoncopies keep loading; bools (the YAMLyes/notrap) and unknown strings raiseConfigError.Schema
scripts/pod-scheduler/pod-config.schema.json(Draft 2020-12). Schema descriptions double as grounding for LLM agents — they should hallucinate fewer keys and wrongtypevalues.jsonschemapackage.Loader
config_loader.pydispatches on file extension:.yml/.yamlvia PyYAML,.jsonvia stdlib (back-compat), unknown extensions →ConfigError.Generated pipelines
benchmarks-ci-01.yml,-02.yml,-azure.yml,-cobalt.ymlregenerated. Only the embedded regen-command header line changed (.json→.yml); the snapshot test confirms scheduling output is byte-identical.Tests
typewith case variations.type.typerejection..jsonback-compat happy path.Docs
README.md,build/README.md, andscripts/pod-scheduler/README.mdupdated with YAML examples and a section on the schema directive.Verification
The snapshot test enforces that regenerating from the new YAML configs produces byte-identical pipeline YAML, so this is a config/format refactor with no behavioral change.
Notes for reviewers
_format_source_pathrepo-relative URL behavior; it now expects.ymlpaths in the test, which matches whatmain.pyis invoked with from CI._parse_scenario_typeexplicitly rejectsboolbecauseyaml.safe_loadhappily turnsyes/no/true/falseinto Python bools, which would otherwise quietly resolve to1/0via the integer alias. Better to fail loudly.additionalProperties: falseon pods/scenarios so a misspelled key (e.g.mahcines:instead ofmachines:) is caught at edit-time by the YAML LSP.