Skip to content

Remove EOL'd citrine/perf relays + VMs from pod scheduler and profiles#2

Closed
LoopedBard3 wants to merge 5 commits into
mainfrom
loopedbard3/update-eol-relays
Closed

Remove EOL'd citrine/perf relays + VMs from pod scheduler and profiles#2
LoopedBard3 wants to merge 5 commits into
mainfrom
loopedbard3/update-eol-relays

Conversation

@LoopedBard3
Copy link
Copy Markdown
Owner

The following Service Bus relays (and their underlying VMs) were end-of-lifed and removed from the aspnetrelay connection list in Azure:

  • perflin (asp-perf-lin, pod intel-perflin)
  • perfwin (asp-perf-win)
  • citrinewin (asp-citrine-win, pod intel-win)
  • citrinelin (asp-citrine-lin, pod intel-lin)
  • citrineamd2 (asp-citrine-amd2, pod amd-lin2)

Changes

build/benchmarks_ci_pods.json — drop the four pods whose SUT is EOL'd (intel-lin, intel-win, intel-perflin, amd-lin2). Every scenario still has at least one valid pod (gold-lin / gold-win), so no scenarios are removed; their pods arrays are pruned accordingly.

build/ci.profile.yml — remove EOL'd profile defs (intel-lin-*, intel-win-*, intel-perflin-app, amd-lin2-*).

build/benchmarks-ci-01.yml, build/benchmarks-ci-02.yml — regenerated via:

python scripts/pod-scheduler/main.py --config build/benchmarks_ci_pods.json --base-name benchmarks-ci --yaml-output build

scenarios/aspnet.profiles.yml, scenarios/aspnet.profiles.standard.yml

  • Remove profiles whose APP endpoint is EOL'd: aspnet-citrine-{lin,win,amd2}[-relay], aspnet-citrine-amd-relay, aspnet-perf-{lin,win}[-relay].

  • Repoint load/db endpoints in profiles whose SUT still works but whose load/db lived on citrineamd2/asp-citrine-amd2:

    Profile Was Now
    aspnet-citrine-arm-lin[-relay] load = citrineamd2 citrineload
    aspnet-citrine-arm-win[-relay] (yml only) db = citrineamd2 citrinedb
    aspnet-siryn-arm-lin[-relay] load = citrineamd2 citrineload
    aspnet-citrine-ampere (yml only) db & load = asp-citrine-amd2 asp-citrine-db / asp-citrine-load

scenarios/proxy.benchmarks.yml, scenarios/proxy.grpc.benchmarks.yml, src/Benchmarks/json.benchmarks.yml, src/BenchmarksApps/BuildPerformance/buildperformance.yml — remove inline profile defs whose APP endpoint is EOL'd (aspnet-citrine-{lin,win,amd}, aspnet-perf-{lin,win}), plus the misplaced aspnet-citrine-lin entry under scenarios: in json.benchmarks.yml.

scenarios/signalr.benchmarks.yml — example comment updated from --profile asp-perf-lin to --profile aspnet-gold-lin so docs reference a living machine.

Validation

  • All 11 touched JSON/YAML files parse cleanly.
  • python -m unittest discover in scripts/pod-scheduler/tests → 43/43 pass.
  • The regenerated benchmarks-ci-0{1,2}.yml contain only gold-lin / gold-win runs.
  • Net diff: +176 / −1797.

LoopedBard3 and others added 5 commits April 21, 2026 09:38
The cobalt cloud machines were moved to a new Azure region. Updated all
VNet IP addresses in build/azure.profile.yml:

- cobalt-server-lin: 10.2.2.15 -> 10.0.4.17
- cobalt-client-lin: 10.2.2.13 -> 10.0.4.18
- cobalt-db-lin: 10.2.2.14 -> 10.0.4.19
- cobalt-server-lin-azure3: 10.2.2.16 -> 10.0.4.20

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Merge cobalt cloud machines into benchmarks_ci_azure.json with machine_group support
- Regenerate benchmarks-ci-azure.yml using crank-scheduler (14 groups, handles machine conflicts)
- Update benchmarks.template.liquid header with scheduler instructions
- Remove separate eastus2 pipeline files (benchmarks-ci-azure-eastus2.yml, benchmarks.matrix.azure.eastus2.yml, benchmarks_ci_azure_eastus2.json)
- Remove benchmarks.matrix.azure.yml (replaced by JSON + scheduler approach)
- Remove cobaltcloud service bus queue (cobalt jobs now use azure/azurearm64 queues)
- Remove EAST US 2 MACHINES header from azure.profile.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pnet#2167)

* Add pod-based crank scheduler prototype

Simplified alternative to PR aspnet#2106's full crank-scheduler. Uses a pod
model where machines are fixed groups (SUT + load + DB) instead of
individual machines with capability scoring and preferred partners.

Key simplifications:
- Pods define fixed machine groupings (no role priority/scoring)
- Shared machines between pods handled via collision detection
- Same greedy longest-job-first bin-packing algorithm
- Same Liquid template YAML generation
- ~570 lines vs ~2000 lines in the full scheduler

Includes:
- scripts/pod-scheduler/ (5 Python files + README)
- build/benchmarks_ci_pods.json (pod-based config for CI benchmarks)

* Add azure, azure-eastus2, and cobalt pod configs

Pod-based configurations for all three additional CI environments:
- benchmarks_ci_azure_pods.json: 6 pods, 14 runs (matches main)
- benchmarks_ci_azure_eastus2_pods.json: 2 pods, 12 runs (matches main)
- benchmarks_ci_cobalt_pods.json: 4 pods, 44 runs (matches main)

Notable pod patterns:
- Azure IDNA pods cross-use each other as load machines
- Cobalt hosted has 28-core variant pods sharing physical machines
  with full-core pods (handled by collision detection)
- Azure eastus2 pods share load/db, serialized automatically

Also fixes unicode bar chars for Windows compatibility.

* Update azure pod config: merge eastus2, keep IDNA on linux loads

Reflects main branch changes from PR aspnet#2166:
- Merged cobalt-cloud-lin pods (eastus2) into azure config
- Removed separate benchmarks_ci_azure_eastus2_pods.json
- Kept IDNA pod load profiles on linux machines (load jobs
  require linux), reverting the main branch profile change
- Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios
  (uses cobalt-cloud-lin-db as load instead of client)
- Total runs: 26 (matches main azure pipeline)

* Regenerate pipeline YAMLs from pod-scheduler configs

Generated via:
  python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build
  python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_azure_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-azure
  python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_cobalt_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-cobalt

* Cap timeoutInMinutes at 240 (max 2x old 120 default)

Formula is now max(120, min(240, 2 * estimated_runtime)).
This prevents scenarios with long runtimes (e.g. Proxies at 150min)
from setting unreasonably high timeouts compared to previous values.

Resulting timeouts: 120 (default), 140 (Grpc), 180 (PGO/Containers), 240 (Proxies)

* Address review feedback
- Fix 4 incorrect template filenames in benchmarks_ci_pods.json:
  crossgen-scenarios -> crossgen2-scenarios,
  custom-proxies-scenarios -> proxies-custom-scenarios,
  single-file-scenarios -> singlefile-scenarios,
  websockets-scenarios -> websocket-scenarios
- Fix machine utilization calculation bug (was inflating totals for
  machines not in current stage)
- Remove unused imports (sys, Any, Dict, json, Pod)
- Remove dead render_with_liquid function and --template CLI arg
- Add guard against empty queues (ZeroDivisionError)
- Update README and docstrings to reflect removed template arg

Code:
- Validate cron schedules at load time and raise on unsupported hour fields instead of silently no-op'ing the offset for split YAMLs
- Add optional 'timeout' override per scenario; fall back to the runtime-derived formula when absent
- Move pipeline plumbing (pool, service-bus connection/namespace) into JSON metadata.pipeline with the previous hardcoded values as defaults
- Strict validation of duplicate pods, duplicate scenario.pods entries, empty queues; default scheduler to fail-fast on unknown/invalid pod references with a --lenient opt-out
- Stricter job-id sanitization (handles '.', '/', parens, leading digits, unicode) and explicit duplicate detection in generated YAML
- Replace id(stage) bookkeeping in split_schedule with explicit indices; add stable name tie-breaker to create_schedule for deterministic output
- Use Run.job_name in the generator instead of duplicating the regex
- Drop stale '--template' arg from generated YAML headers and README

Tests:
- 41 unit + snapshot tests covering models, config loader, scheduler, generator, and YAML parity with the committed *_pods.json configs

Cleanup:
- Revert benchmarks.template.liquid and benchmarks_ci_azure.json to main; the deleted crank-scheduler does not consume them
- Regenerate all four pipeline YAMLs against the new generator

* Remove unused benchmarks.template.liquid
The Liquid template was only consumed by the deleted crank-scheduler. The pod-scheduler renders pipeline YAML directly via Python, and grep confirms no other script, pipeline, or build step reads this file.

* Remove orphaned benchmarks.yml and benchmarks.matrix.0[12].yml
These were artifacts of the old hand-driven matrix.yml -> json -> Liquid template -> benchmarks.yml workflow. Their only inbound references were stale documentation comments cross-pointing between each other; nothing in the repo (no script, no pipeline) consumed them.

* Document pod-scheduler flow across READMEs and YAML headers
- Generated YAML headers now embed the exact regen command (with the source config and base name) and a pointer to scripts/pod-scheduler/README.md, so each file documents how to reproduce itself
- New build/README.md maps each *_pods.json config to the YAML it produces, lists the hand-maintained scenario templates, and explains the typical edit/regenerate workflow
- Top-level README.md gains a 'Continuous benchmarking pipelines' section linking to the pod-scheduler and build/ docs
- pod-scheduler README's Quick Start now uses repo-root-relative commands and points at the snapshot tests for verification
- Tests cover the new _format_source_path helper and the snapshot test passes the source config so headers stay verified

* Remove orphaned crank-scheduler JSON configs

benchmarks_ci.json, benchmarks_ci_azure.json, and benchmarks_ci_cobalt.json used the old 'machines + capabilities' format consumed by the deleted crank-scheduler. Their replacements (benchmarks_ci_pods.json, benchmarks_ci_azure_pods.json, benchmarks_ci_cobalt_pods.json) drive the pod-scheduler. grep finds zero inbound references for any of the three across scripts, pipelines, docs, and tests.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Parker Bibus <parker.bibus@microsoft.com>
…#2168)

After the move to the VMR (dotnet/dotnet), the .NET runtime and ASP.NET
runtime are built from dotnet/dotnet, and the commit hashes Crank captures
are dotnet/dotnet commits. The Crank agent still hard-codes the old
component-repo URLs (dotnet/aspnetcore, dotnet/runtime) for the synthetic
Microsoft.AspNetCore.App and Microsoft.NETCore.App framework dependencies,
so the rendered compare URLs in regression issues 404 (e.g. issue
dotnet/aspnetcore#66568).

Rewrite the URL at template-render time inside the Changes block of all
five issue-body templates (rps, published-size, start-time in
regressions.config.yml; download-size, first-ui in regressions.blazor.config.yml).
The rewrite triggers when the dependency name is one of the two synthetic
framework names, or when the stored RepositoryUrl is one of the legacy
component-repo URLs. Other dependencies (e.g. application-owned assemblies
like Antiforgery.dll, or already-correct dotnet/dotnet entries from
post-VMR assembly metadata) are left untouched.

Verified by parsing both YAML configs through Fluid and rendering each
template against synthetic regressions covering all five diff cases:
0 leftover dotnet/aspnetcore or dotnet/runtime URLs, preserved
aspnet/Benchmarks URL on application assemblies, framework deps
correctly attributed to dotnet/dotnet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The following Service Bus relays and their underlying VMs have been
end-of-lifed:
  - perflin   (asp-perf-lin,    intel-perflin pod)
  - perfwin   (asp-perf-win)
  - citrinewin (asp-citrine-win, intel-win pod)
  - citrinelin (asp-citrine-lin, intel-lin pod)
  - citrineamd2 (asp-citrine-amd2, amd-lin2 pod)

Changes:

build/benchmarks_ci_pods.json
  Drop the four pods whose SUT machines are EOL'd (intel-lin,
  intel-win, intel-perflin, amd-lin2). All scenarios still have at
  least one valid pod (gold-lin, gold-win) so no scenarios are
  removed; their pods arrays are pruned accordingly.

build/ci.profile.yml
  Remove EOL'd profile defs that point to dead hostnames:
  intel-lin-app/load, intel-win-app/load, intel-perflin-app,
  amd-lin2-app/load/db.

build/benchmarks-ci-01.yml, build/benchmarks-ci-02.yml
  Regenerated via:
    python scripts/pod-scheduler/main.py --config build/benchmarks_ci_pods.json --base-name benchmarks-ci --yaml-output build

scenarios/aspnet.profiles.yml, scenarios/aspnet.profiles.standard.yml
  - Remove profiles whose APP endpoint is EOL'd (per request, removed
    completely rather than re-pointing): aspnet-citrine-lin[-relay],
    aspnet-citrine-win[-relay], aspnet-citrine-amd2 /
    aspnet-citrine-amd-relay, aspnet-perf-lin[-relay],
    aspnet-perf-win[-relay].
  - Repoint the load/db endpoints in profiles whose SUT still works
    but whose load/db happened to live on an EOL'd machine:
      aspnet-citrine-arm-lin[-relay] secondary: citrineamd2 -> citrineload
      aspnet-citrine-arm-win[-relay] db (yml only): citrineamd2 -> citrinedb
      aspnet-siryn-arm-lin[-relay]   secondary: citrineamd2 -> citrineload
      aspnet-citrine-ampere (yml only) db & load: asp-citrine-amd2 ->
        asp-citrine-db / asp-citrine-load

scenarios/proxy.benchmarks.yml, scenarios/proxy.grpc.benchmarks.yml,
src/Benchmarks/json.benchmarks.yml,
src/BenchmarksApps/BuildPerformance/buildperformance.yml
  Remove inline profile defs whose APP endpoint is EOL'd
  (aspnet-citrine-lin/win/amd, aspnet-perf-lin/win, plus the
  misplaced aspnet-citrine-lin entry under scenarios in
  json.benchmarks.yml).

scenarios/signalr.benchmarks.yml
  Update example-comment profile reference from asp-perf-lin to
  aspnet-gold-lin so docs reflect a still-living machine.

Verified: all touched JSON/YAML files parse, all 43 pod-scheduler unit
tests pass, and the regenerated benchmarks-ci-0{1,2}.yml contain only
gold-lin/gold-win runs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@LoopedBard3
Copy link
Copy Markdown
Owner Author

Reopening against aspnet/Benchmarks instead.

@LoopedBard3 LoopedBard3 closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant