Add performance-investigation skill by eiriktsarpalis · Pull Request #125972 · dotnet/runtime

eiriktsarpalis · 2026-03-23T16:34:09Z

Note

This PR was AI/Copilot-generated.

Summary

Adds a new performance-investigation skill for local performance investigation in dotnet/runtime. This is separate from the existing performance-benchmark skill (which handles ad hoc PR benchmarking via @EgorBot).

New skill: `.github/skills/performance-investigation/`

SKILL.md — Local investigation workflow:

Building dotnet/runtime at specific commits and obtaining CoreRun
Running BenchmarkDotNet against private builds via --coreRun
Comparing good/bad commits side-by-side
Git bisect methodology for finding culprit commits

Reference docs:

local-benchmarking.md — Building CoreRun, --coreRun usage, ResultsComparer, incremental rebuilds
bisection-guide.md — Feasibility check, bisect range identification, git bisect workflow

Evals: 6 test cases covering local investigation workflows + negative cases.

Changes to existing skills

issue-triage — Slimmed perf-regression-triage.md to triage-specific assessment/recommendation criteria; references new skill for local investigation methodology. Related Skills table lists both performance-investigation and performance-benchmark.
performance-benchmark — Unchanged (kept as-is for @EgorBot workflows)

Consolidate all performance investigation guidance into a single skill with three workflows: - PR benchmark validation (EgorBot/MihuBot) - Regression investigation (CoreRun builds, git bisect) - JIT diff analysis (MihuBot) Reference docs cover EgorBot, MihuBot, local benchmarking with CoreRun, and git bisect methodology. Includes 9 evals covering all workflows plus negative cases. The existing performance-benchmark skill is removed (fully superseded). The issue-triage skill's perf-regression-triage.md is slimmed to keep only triage-specific assessment/recommendation criteria, delegating investigation methodology to the new skill. All cross-references updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

stephentoub · 2026-03-23T16:37:44Z

Consolidates all performance investigation guidance into a new performance-investigation skill

Why?

Copilot

Pull request overview

This PR consolidates dotnet/runtime performance benchmarking and regression investigation guidance into a new performance-investigation skill, replacing the previous performance-benchmark skill and trimming issue-triage to focus on triage-specific criteria.

Changes:

Added a new performance-investigation skill with three workflows (PR benchmarking, regression investigation/bisection, JIT diff analysis) plus reference docs and eval coverage.
Removed the superseded performance-benchmark skill and updated cross-skill references to point to performance-investigation.
Slimmed issue-triage’s perf regression guide to triage-only guidance and delegated investigation methodology to the new skill.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
.github/skills/performance-investigation/SKILL.md	New consolidated performance investigation skill (workflows + guidance).
.github/skills/performance-investigation/evals/evals.json	Adds eval scenarios/assertions for the new skill.
.github/skills/performance-investigation/references/egorbot-reference.md	Adds EgorBot usage reference for benchmark-as-a-service workflows.
.github/skills/performance-investigation/references/mihubot-reference.md	Adds MihuBot usage reference for JIT diffs/benchmarks/fuzzing.
.github/skills/performance-investigation/references/local-benchmarking.md	Documents local CoreRun-based benchmarking and comparison workflow.
.github/skills/performance-investigation/references/bisection-guide.md	Documents git bisect methodology for perf regressions.
.github/skills/performance-benchmark/SKILL.md	Removes the old (superseded) performance-benchmark skill.
.github/skills/issue-triage/references/perf-regression-triage.md	Refocuses on triage criteria; points investigation to performance-investigation.
.github/skills/issue-triage/SKILL.md	Updates perf-regression row and follow-up skill recommendation.
.github/skills/jit-regression-test/SKILL.md	Updates reference to use performance-investigation (not performance-benchmark).
.github/skills/api-proposal/SKILL.md	Updates perf guidance to reference performance-investigation.
.github/copilot-instructions.md	Updates instruction to use performance-investigation for perf validation.

.github/skills/performance-investigation/SKILL.md

.github/skills/performance-investigation/references/egorbot-reference.md

…-instructions.md) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

eiriktsarpalis · 2026-03-23T17:04:00Z

Consolidates all performance investigation guidance into a new performance-investigation skill

Why?

I realized there's overlap between the existing performance-benchmark skill and the instructions in the performance regression reference in issue triage. It made sense to distill these into one skill.

Use quadruple-backtick outer fence for nested code blocks and simplify inline backtick formatting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

stephentoub · 2026-03-23T17:06:14Z

Consolidates all performance investigation guidance into a new performance-investigation skill

Why?

I realized there's overlap between the existing performance-benchmark skill and the instructions in the performance regression reference in issue triage. It made sense to distill these into one skill.

What overlap? The performance-benchmark skill is entirely about creating a new benchmark for a PR and interacting with EgorBot.

eiriktsarpalis · 2026-03-23T17:09:43Z

Are you saying we should be keeping separate skills for separate aspects of performance investigations? It might not always be appropriate to invoke EgorBot.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

.github/skills/performance-investigation/references/egorbot-reference.md

.github/skills/performance-investigation/SKILL.md

.github/skills/performance-investigation/references/local-benchmarking.md

.github/skills/performance-investigation/references/bisection-guide.md

.github/skills/performance-investigation/references/local-benchmarking.md

stephentoub · 2026-03-23T17:13:21Z

Are you saying we should be keeping separate skills for separate aspects of performance investigations? It might not always be appropriate to invoke EgorBot.

We should have skills dedicated to specific purposes. We can rename the skill if its too broad (I don't think it is), but I don't see a good reason to combine it

eiriktsarpalis · 2026-03-23T17:17:01Z

What do you think about the newly added skill, irrespective of what happens to the existing performance-benchmark?

- Fix inline code formatting for fenced block marker in egorbot-reference.md - Remove specific MihuBot option names from SKILL.md (not in reference doc) - Clarify testhost vs coreclr CoreRun path distinction in local-benchmarking.md - Expand bisection-guide.md CoreRun path to full testhost path - Add CoreLib libs.pretest caveat for incremental rebuilds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

stephentoub · 2026-03-23T17:29:06Z

What do you think about the newly added skill, irrespective of what happens to the existing performance-benchmark?

I don't know what that means.

eiriktsarpalis · 2026-03-23T17:30:29Z

Do you think there should exist a skill for local performance investigations?

stephentoub · 2026-03-23T17:39:43Z

Do you think there should exist a skill for local performance investigations?

I haven't needed a skill for that; it does just fine on its own in my experience, and when I have wanted something custom I've wanted it integrated with the scheme I use locally (where I already have folders set up representing binaries from main and copy into a pr folder the ones under test).

But I'd be fine with us adding a general-purpose local investigations skill separately if it adds meaningful value. I don't think it's related to invoking egorbot.

Per PR feedback: keep performance-benchmark as a separate skill for EgorBot/PR benchmarking. Refocus performance-investigation to local-only workflows: building CoreRun, comparing commits with BDN, git bisect. - Restore performance-benchmark/SKILL.md (unchanged from main) - Revert cross-reference changes to copilot-instructions.md, api-proposal, jit-regression-test (these should reference performance-benchmark) - Remove egorbot-reference.md and mihubot-reference.md (bot territory) - Rewrite SKILL.md to remove Workflow 1 (PR benchmark) and Workflow 3 (JIT diffs), keeping only local investigation - Update issue-triage Related Skills to list both skills - Update evals to match local-only scope (6 evals) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Copilot · 2026-03-24T10:49:57Z

.github/skills/issue-triage/SKILL.md

+| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) |
+| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot |


The PR description says the performance-benchmark skill is deleted / superseded, but this table still suggests performance-benchmark for validated regressions. Either keep the skill (and update the PR description accordingly) or update the triage guidance to point exclusively at the new consolidated skill.

Copilot · 2026-03-24T10:49:57Z

.github/skills/performance-investigation/evals/evals.json

+  "skill_name": "performance-investigation",
+  "evals": [
+    {
+      "id": 1,
+      "name": "perf-regression-autobot",


The PR description mentions “9 test cases” for the new skill, but this eval file defines 6 eval entries. Please either add the missing evals (to cover all stated workflows/negative cases) or update the PR description to match what’s actually checked in.

Copilot · 2026-03-24T10:49:57Z

.github/skills/performance-investigation/SKILL.md

+description: >
+  Investigate performance regressions locally in dotnet/runtime. Use this skill
+  when asked to investigate a performance regression, bisect to find a culprit
+  commit, validate a regression with local builds, compare performance between
+  commits using CoreRun, or benchmark private runtime builds with
+  BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local
+  benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking
+  with @EgorBot or @MihuBot (use the performance-benchmark skill instead).


The skill header says not to use this skill for ad hoc PR benchmarking with @EgorBot/@MihuBot and to use performance-benchmark instead, but the PR description + updated triage docs indicate this new skill is meant to consolidate performance investigation guidance (including bot-based workflows) and supersede performance-benchmark. Please align the skill’s scope and cross-skill references (either move bot workflows into this skill, or remove the “superseded” claim / bot references elsewhere).

Copilot · 2026-03-24T10:49:58Z

.github/skills/performance-investigation/references/bisection-guide.md

+
+If the regression is not reproducible locally, check for environment differences
+(OS, arch, CPU model) and note this in your report. Consider using
+[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.


This guide links to egorbot-reference.md, but that file doesn’t exist under .github/skills/performance-investigation/references/ (and no such file exists elsewhere in the repo). Either add the referenced doc or update the link to the correct location (e.g., the relevant skill/doc that documents EgorBot usage).

Suggested change

[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.

@EgorBot to validate on dedicated hardware instead.

Copilot · 2026-03-24T10:49:58Z

.github/skills/performance-investigation/references/local-benchmarking.md

+    --coreRun /tmp/corerun-good/.../CoreRun \
+              /tmp/corerun-bad/.../CoreRun


The --coreRun examples use /tmp/corerun-good/.../CoreRun with an ellipsis, but earlier steps copy the whole net{ver}-{os}-Release-{arch} testhost directory to /tmp/corerun-good. It would be clearer to show the concrete relative path from that copied directory to the CoreRun binary so readers can run the command without guesswork.

Suggested change

--coreRun /tmp/corerun-good/.../CoreRun \

/tmp/corerun-bad/.../CoreRun

--coreRun /tmp/corerun-good/shared/Microsoft.NETCore.App/{ver}/CoreRun \

/tmp/corerun-bad/shared/Microsoft.NETCore.App/{ver}/CoreRun

Copilot · 2026-03-24T10:49:58Z

.github/skills/issue-triage/references/perf-regression-triage.md

+Use the `performance-investigation` skill (Workflow 2: Regression Investigation)
+for the full methodology, which includes:

-**Caveat:** If bisect crosses a commit that changes the build infrastructure
-(e.g., SDK version bump in `global.json`), the incremental build may fail.
-Use exit code `125` (skip) to handle this gracefully.
+- Feasibility checks for local vs. bot-based investigation
+- Building dotnet/runtime at specific commits and using CoreRun
+- Comparing good/bad commits with BenchmarkDotNet
+- Git bisect workflow for finding the culprit commit
+- Using @EgorBot and @MihuBot for remote validation


This doc says the performance-investigation skill covers “bot usage” / “remote validation” with @EgorBot and @MihuBot, but the new performance-investigation skill text currently says not to use it for bot-based benchmarking and there are no bot reference docs under that skill. Please make the skill boundaries consistent (either move bot guidance into performance-investigation, or remove bot references here / point to the correct skill).

EgorBo · 2026-03-24T12:07:23Z

Oh gosh, the PR/comments mentioned EgorBot, it launched & failed, automatically posted that reply & Copilot filed a PR... 🤦‍♂️ I guess it's time for me to sprinke some AI on my side to just ignore mentions which clearly do not look like requests for benchmarks.

eiriktsarpalis · 2026-03-24T13:41:25Z

Sorry about that, I guess it helped us discover a bug.

Copilot AI review requested due to automatic review settings March 23, 2026 16:34

github-actions bot added the area-skills Agent Skills label Mar 23, 2026

dotnet-policy-service bot assigned eiriktsarpalis Mar 23, 2026

Copilot started reviewing on behalf of eiriktsarpalis March 23, 2026 16:35 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

.github/skills/performance-investigation/SKILL.md Outdated Show resolved Hide resolved

.github/skills/performance-investigation/references/egorbot-reference.md Outdated Show resolved Hide resolved

Remove AI-generated content disclosure from skill (already in copilot…

d1c70b9

…-instructions.md) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix code fence formatting in SKILL.md and egorbot-reference.md

262c0f9

Use quadruple-backtick outer fence for nested code blocks and simplify inline backtick formatting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 23, 2026 17:05

Copilot started reviewing on behalf of eiriktsarpalis March 23, 2026 17:06 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 24, 2026 10:45

Copilot started reviewing on behalf of eiriktsarpalis March 24, 2026 10:46 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

EgorBot mentioned this pull request Mar 24, 2026

Benchmarks for dotnet/runtime#125972 (for @Copilot) EgorBot/Benchmarks#63

Open

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

Copilot AI mentioned this pull request Mar 24, 2026

Add performance-investigation skill for local regression investigation #126022

Closed

		\| Performance regression confirmed \| performance-investigation \| Offer to investigate the regression locally (CoreRun builds, bisection) \|
		\| Performance regression confirmed \| performance-benchmark \| Offer to validate the regression with ad hoc benchmarks via @EgorBot \|

	[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.
	@EgorBot to validate on dedicated hardware instead.

		--coreRun /tmp/corerun-good/.../CoreRun \
		/tmp/corerun-bad/.../CoreRun

Conversation

eiriktsarpalis commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New skill: .github/skills/performance-investigation/

Changes to existing skills

Uh oh!

stephentoub commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

eiriktsarpalis commented Mar 23, 2026

Uh oh!

stephentoub commented Mar 23, 2026

Uh oh!

eiriktsarpalis commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eiriktsarpalis commented Mar 23, 2026

Uh oh!

stephentoub commented Mar 23, 2026

Uh oh!

eiriktsarpalis commented Mar 23, 2026

Uh oh!

stephentoub commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

EgorBo commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eiriktsarpalis commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eiriktsarpalis commented Mar 23, 2026 •

edited

Loading

New skill: `.github/skills/performance-investigation/`

eiriktsarpalis commented Mar 23, 2026 •

edited

Loading

stephentoub commented Mar 23, 2026 •

edited

Loading

EgorBo commented Mar 24, 2026 •

edited

Loading