Conversation
Consolidate all performance investigation guidance into a single skill with three workflows: - PR benchmark validation (EgorBot/MihuBot) - Regression investigation (CoreRun builds, git bisect) - JIT diff analysis (MihuBot) Reference docs cover EgorBot, MihuBot, local benchmarking with CoreRun, and git bisect methodology. Includes 9 evals covering all workflows plus negative cases. The existing performance-benchmark skill is removed (fully superseded). The issue-triage skill's perf-regression-triage.md is slimmed to keep only triage-specific assessment/recommendation criteria, delegating investigation methodology to the new skill. All cross-references updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why? |
There was a problem hiding this comment.
Pull request overview
This PR consolidates dotnet/runtime performance benchmarking and regression investigation guidance into a new performance-investigation skill, replacing the previous performance-benchmark skill and trimming issue-triage to focus on triage-specific criteria.
Changes:
- Added a new
performance-investigationskill with three workflows (PR benchmarking, regression investigation/bisection, JIT diff analysis) plus reference docs and eval coverage. - Removed the superseded
performance-benchmarkskill and updated cross-skill references to point toperformance-investigation. - Slimmed
issue-triage’s perf regression guide to triage-only guidance and delegated investigation methodology to the new skill.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/skills/performance-investigation/SKILL.md | New consolidated performance investigation skill (workflows + guidance). |
| .github/skills/performance-investigation/evals/evals.json | Adds eval scenarios/assertions for the new skill. |
| .github/skills/performance-investigation/references/egorbot-reference.md | Adds EgorBot usage reference for benchmark-as-a-service workflows. |
| .github/skills/performance-investigation/references/mihubot-reference.md | Adds MihuBot usage reference for JIT diffs/benchmarks/fuzzing. |
| .github/skills/performance-investigation/references/local-benchmarking.md | Documents local CoreRun-based benchmarking and comparison workflow. |
| .github/skills/performance-investigation/references/bisection-guide.md | Documents git bisect methodology for perf regressions. |
| .github/skills/performance-benchmark/SKILL.md | Removes the old (superseded) performance-benchmark skill. |
| .github/skills/issue-triage/references/perf-regression-triage.md | Refocuses on triage criteria; points investigation to performance-investigation. |
| .github/skills/issue-triage/SKILL.md | Updates perf-regression row and follow-up skill recommendation. |
| .github/skills/jit-regression-test/SKILL.md | Updates reference to use performance-investigation (not performance-benchmark). |
| .github/skills/api-proposal/SKILL.md | Updates perf guidance to reference performance-investigation. |
| .github/copilot-instructions.md | Updates instruction to use performance-investigation for perf validation. |
.github/skills/performance-investigation/references/egorbot-reference.md
Outdated
Show resolved
Hide resolved
…-instructions.md) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
I realized there's overlap between the existing |
Use quadruple-backtick outer fence for nested code blocks and simplify inline backtick formatting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What overlap? The performance-benchmark skill is entirely about creating a new benchmark for a PR and interacting with EgorBot. |
|
Are you saying we should be keeping separate skills for separate aspects of performance investigations? It might not always be appropriate to invoke EgorBot. |
.github/skills/performance-investigation/references/egorbot-reference.md
Outdated
Show resolved
Hide resolved
.github/skills/performance-investigation/references/local-benchmarking.md
Show resolved
Hide resolved
.github/skills/performance-investigation/references/bisection-guide.md
Outdated
Show resolved
Hide resolved
.github/skills/performance-investigation/references/local-benchmarking.md
Outdated
Show resolved
Hide resolved
We should have skills dedicated to specific purposes. We can rename the skill if its too broad (I don't think it is), but I don't see a good reason to combine it |
|
What do you think about the newly added skill, irrespective of what happens to the existing |
- Fix inline code formatting for fenced block marker in egorbot-reference.md - Remove specific MihuBot option names from SKILL.md (not in reference doc) - Clarify testhost vs coreclr CoreRun path distinction in local-benchmarking.md - Expand bisection-guide.md CoreRun path to full testhost path - Add CoreLib libs.pretest caveat for incremental rebuilds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
I don't know what that means. |
|
Do you think there should exist a skill for local performance investigations? |
I haven't needed a skill for that; it does just fine on its own in my experience, and when I have wanted something custom I've wanted it integrated with the scheme I use locally (where I already have folders set up representing binaries from main and copy into a pr folder the ones under test). But I'd be fine with us adding a general-purpose local investigations skill separately if it adds meaningful value. I don't think it's related to invoking egorbot. |
Per PR feedback: keep performance-benchmark as a separate skill for EgorBot/PR benchmarking. Refocus performance-investigation to local-only workflows: building CoreRun, comparing commits with BDN, git bisect. - Restore performance-benchmark/SKILL.md (unchanged from main) - Revert cross-reference changes to copilot-instructions.md, api-proposal, jit-regression-test (these should reference performance-benchmark) - Remove egorbot-reference.md and mihubot-reference.md (bot territory) - Rewrite SKILL.md to remove Workflow 1 (PR benchmark) and Workflow 3 (JIT diffs), keeping only local investigation - Update issue-triage Related Skills to list both skills - Update evals to match local-only scope (6 evals) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| | Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) | | ||
| | Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot | |
There was a problem hiding this comment.
The PR description says the performance-benchmark skill is deleted / superseded, but this table still suggests performance-benchmark for validated regressions. Either keep the skill (and update the PR description accordingly) or update the triage guidance to point exclusively at the new consolidated skill.
| "skill_name": "performance-investigation", | ||
| "evals": [ | ||
| { | ||
| "id": 1, | ||
| "name": "perf-regression-autobot", |
There was a problem hiding this comment.
The PR description mentions “9 test cases” for the new skill, but this eval file defines 6 eval entries. Please either add the missing evals (to cover all stated workflows/negative cases) or update the PR description to match what’s actually checked in.
| description: > | ||
| Investigate performance regressions locally in dotnet/runtime. Use this skill | ||
| when asked to investigate a performance regression, bisect to find a culprit | ||
| commit, validate a regression with local builds, compare performance between | ||
| commits using CoreRun, or benchmark private runtime builds with | ||
| BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local | ||
| benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking | ||
| with @EgorBot or @MihuBot (use the performance-benchmark skill instead). |
There was a problem hiding this comment.
The skill header says not to use this skill for ad hoc PR benchmarking with @EgorBot/@MihuBot and to use performance-benchmark instead, but the PR description + updated triage docs indicate this new skill is meant to consolidate performance investigation guidance (including bot-based workflows) and supersede performance-benchmark. Please align the skill’s scope and cross-skill references (either move bot workflows into this skill, or remove the “superseded” claim / bot references elsewhere).
|
|
||
| If the regression is not reproducible locally, check for environment differences | ||
| (OS, arch, CPU model) and note this in your report. Consider using | ||
| [@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead. |
There was a problem hiding this comment.
This guide links to egorbot-reference.md, but that file doesn’t exist under .github/skills/performance-investigation/references/ (and no such file exists elsewhere in the repo). Either add the referenced doc or update the link to the correct location (e.g., the relevant skill/doc that documents EgorBot usage).
| [@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead. | |
| @EgorBot to validate on dedicated hardware instead. |
| --coreRun /tmp/corerun-good/.../CoreRun \ | ||
| /tmp/corerun-bad/.../CoreRun |
There was a problem hiding this comment.
The --coreRun examples use /tmp/corerun-good/.../CoreRun with an ellipsis, but earlier steps copy the whole net{ver}-{os}-Release-{arch} testhost directory to /tmp/corerun-good. It would be clearer to show the concrete relative path from that copied directory to the CoreRun binary so readers can run the command without guesswork.
| --coreRun /tmp/corerun-good/.../CoreRun \ | |
| /tmp/corerun-bad/.../CoreRun | |
| --coreRun /tmp/corerun-good/shared/Microsoft.NETCore.App/{ver}/CoreRun \ | |
| /tmp/corerun-bad/shared/Microsoft.NETCore.App/{ver}/CoreRun |
| Use the `performance-investigation` skill (Workflow 2: Regression Investigation) | ||
| for the full methodology, which includes: | ||
|
|
||
| **Caveat:** If bisect crosses a commit that changes the build infrastructure | ||
| (e.g., SDK version bump in `global.json`), the incremental build may fail. | ||
| Use exit code `125` (skip) to handle this gracefully. | ||
| - Feasibility checks for local vs. bot-based investigation | ||
| - Building dotnet/runtime at specific commits and using CoreRun | ||
| - Comparing good/bad commits with BenchmarkDotNet | ||
| - Git bisect workflow for finding the culprit commit | ||
| - Using @EgorBot and @MihuBot for remote validation |
There was a problem hiding this comment.
This doc says the performance-investigation skill covers “bot usage” / “remote validation” with @EgorBot and @MihuBot, but the new performance-investigation skill text currently says not to use it for bot-based benchmarking and there are no bot reference docs under that skill. Please make the skill boundaries consistent (either move bot guidance into performance-investigation, or remove bot references here / point to the correct skill).
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
|
Oh gosh, the PR/comments mentioned EgorBot, it launched & failed, automatically posted that reply & Copilot filed a PR... 🤦♂️ I guess it's time for me to sprinke some AI on my side to just ignore mentions which clearly do not look like requests for benchmarks. |
|
Sorry about that, I guess it helped us discover a bug. |
Note
This PR was AI/Copilot-generated.
Summary
Adds a new
performance-investigationskill for local performance investigation in dotnet/runtime. This is separate from the existingperformance-benchmarkskill (which handles ad hoc PR benchmarking via @EgorBot).New skill:
.github/skills/performance-investigation/SKILL.md — Local investigation workflow:
--coreRunReference docs:
local-benchmarking.md— Building CoreRun,--coreRunusage, ResultsComparer, incremental rebuildsbisection-guide.md— Feasibility check, bisect range identification, git bisect workflowEvals: 6 test cases covering local investigation workflows + negative cases.
Changes to existing skills
issue-triage— Slimmedperf-regression-triage.mdto triage-specific assessment/recommendation criteria; references new skill for local investigation methodology. Related Skills table lists bothperformance-investigationandperformance-benchmark.performance-benchmark— Unchanged (kept as-is for @EgorBot workflows)