Skip to content

Add performance-investigation skill#125972

Draft
eiriktsarpalis wants to merge 5 commits intomainfrom
skill/perf-investigation
Draft

Add performance-investigation skill#125972
eiriktsarpalis wants to merge 5 commits intomainfrom
skill/perf-investigation

Conversation

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 23, 2026

Note

This PR was AI/Copilot-generated.

Summary

Adds a new performance-investigation skill for local performance investigation in dotnet/runtime. This is separate from the existing performance-benchmark skill (which handles ad hoc PR benchmarking via @EgorBot).

New skill: .github/skills/performance-investigation/

SKILL.md — Local investigation workflow:

  • Building dotnet/runtime at specific commits and obtaining CoreRun
  • Running BenchmarkDotNet against private builds via --coreRun
  • Comparing good/bad commits side-by-side
  • Git bisect methodology for finding culprit commits

Reference docs:

  • local-benchmarking.md — Building CoreRun, --coreRun usage, ResultsComparer, incremental rebuilds
  • bisection-guide.md — Feasibility check, bisect range identification, git bisect workflow

Evals: 6 test cases covering local investigation workflows + negative cases.

Changes to existing skills

  • issue-triage — Slimmed perf-regression-triage.md to triage-specific assessment/recommendation criteria; references new skill for local investigation methodology. Related Skills table lists both performance-investigation and performance-benchmark.
  • performance-benchmark — Unchanged (kept as-is for @EgorBot workflows)

Consolidate all performance investigation guidance into a single skill with
three workflows:
- PR benchmark validation (EgorBot/MihuBot)
- Regression investigation (CoreRun builds, git bisect)
- JIT diff analysis (MihuBot)

Reference docs cover EgorBot, MihuBot, local benchmarking with CoreRun,
and git bisect methodology. Includes 9 evals covering all workflows plus
negative cases.

The existing performance-benchmark skill is removed (fully superseded).
The issue-triage skill's perf-regression-triage.md is slimmed to keep only
triage-specific assessment/recommendation criteria, delegating investigation
methodology to the new skill. All cross-references updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 23, 2026 16:34
@github-actions github-actions bot added the area-skills Agent Skills label Mar 23, 2026
@stephentoub
Copy link
Member

Consolidates all performance investigation guidance into a new performance-investigation skill

Why?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates dotnet/runtime performance benchmarking and regression investigation guidance into a new performance-investigation skill, replacing the previous performance-benchmark skill and trimming issue-triage to focus on triage-specific criteria.

Changes:

  • Added a new performance-investigation skill with three workflows (PR benchmarking, regression investigation/bisection, JIT diff analysis) plus reference docs and eval coverage.
  • Removed the superseded performance-benchmark skill and updated cross-skill references to point to performance-investigation.
  • Slimmed issue-triage’s perf regression guide to triage-only guidance and delegated investigation methodology to the new skill.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.github/skills/performance-investigation/SKILL.md New consolidated performance investigation skill (workflows + guidance).
.github/skills/performance-investigation/evals/evals.json Adds eval scenarios/assertions for the new skill.
.github/skills/performance-investigation/references/egorbot-reference.md Adds EgorBot usage reference for benchmark-as-a-service workflows.
.github/skills/performance-investigation/references/mihubot-reference.md Adds MihuBot usage reference for JIT diffs/benchmarks/fuzzing.
.github/skills/performance-investigation/references/local-benchmarking.md Documents local CoreRun-based benchmarking and comparison workflow.
.github/skills/performance-investigation/references/bisection-guide.md Documents git bisect methodology for perf regressions.
.github/skills/performance-benchmark/SKILL.md Removes the old (superseded) performance-benchmark skill.
.github/skills/issue-triage/references/perf-regression-triage.md Refocuses on triage criteria; points investigation to performance-investigation.
.github/skills/issue-triage/SKILL.md Updates perf-regression row and follow-up skill recommendation.
.github/skills/jit-regression-test/SKILL.md Updates reference to use performance-investigation (not performance-benchmark).
.github/skills/api-proposal/SKILL.md Updates perf guidance to reference performance-investigation.
.github/copilot-instructions.md Updates instruction to use performance-investigation for perf validation.

…-instructions.md)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eiriktsarpalis
Copy link
Member Author

Consolidates all performance investigation guidance into a new performance-investigation skill

Why?

I realized there's overlap between the existing performance-benchmark skill and the instructions in the performance regression reference in issue triage. It made sense to distill these into one skill.

Use quadruple-backtick outer fence for nested code blocks and simplify
inline backtick formatting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 23, 2026 17:05
@stephentoub
Copy link
Member

Consolidates all performance investigation guidance into a new performance-investigation skill

Why?

I realized there's overlap between the existing performance-benchmark skill and the instructions in the performance regression reference in issue triage. It made sense to distill these into one skill.

What overlap? The performance-benchmark skill is entirely about creating a new benchmark for a PR and interacting with EgorBot.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Mar 23, 2026

Are you saying we should be keeping separate skills for separate aspects of performance investigations? It might not always be appropriate to invoke EgorBot.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

@stephentoub
Copy link
Member

stephentoub commented Mar 23, 2026

Are you saying we should be keeping separate skills for separate aspects of performance investigations? It might not always be appropriate to invoke EgorBot.

We should have skills dedicated to specific purposes. We can rename the skill if its too broad (I don't think it is), but I don't see a good reason to combine it

@eiriktsarpalis
Copy link
Member Author

What do you think about the newly added skill, irrespective of what happens to the existing performance-benchmark?

- Fix inline code formatting for fenced block marker in egorbot-reference.md
- Remove specific MihuBot option names from SKILL.md (not in reference doc)
- Clarify testhost vs coreclr CoreRun path distinction in local-benchmarking.md
- Expand bisection-guide.md CoreRun path to full testhost path
- Add CoreLib libs.pretest caveat for incremental rebuilds

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub
Copy link
Member

What do you think about the newly added skill, irrespective of what happens to the existing performance-benchmark?

I don't know what that means.

@eiriktsarpalis
Copy link
Member Author

Do you think there should exist a skill for local performance investigations?

@stephentoub
Copy link
Member

Do you think there should exist a skill for local performance investigations?

I haven't needed a skill for that; it does just fine on its own in my experience, and when I have wanted something custom I've wanted it integrated with the scheme I use locally (where I already have folders set up representing binaries from main and copy into a pr folder the ones under test).

But I'd be fine with us adding a general-purpose local investigations skill separately if it adds meaningful value. I don't think it's related to invoking egorbot.

Per PR feedback: keep performance-benchmark as a separate skill for
EgorBot/PR benchmarking. Refocus performance-investigation to local-only
workflows: building CoreRun, comparing commits with BDN, git bisect.

- Restore performance-benchmark/SKILL.md (unchanged from main)
- Revert cross-reference changes to copilot-instructions.md, api-proposal,
  jit-regression-test (these should reference performance-benchmark)
- Remove egorbot-reference.md and mihubot-reference.md (bot territory)
- Rewrite SKILL.md to remove Workflow 1 (PR benchmark) and Workflow 3
  (JIT diffs), keeping only local investigation
- Update issue-triage Related Skills to list both skills
- Update evals to match local-only scope (6 evals)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 24, 2026 10:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Comment on lines +524 to +525
| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) |
| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot |
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the performance-benchmark skill is deleted / superseded, but this table still suggests performance-benchmark for validated regressions. Either keep the skill (and update the PR description accordingly) or update the triage guidance to point exclusively at the new consolidated skill.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +6
"skill_name": "performance-investigation",
"evals": [
{
"id": 1,
"name": "perf-regression-autobot",
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions “9 test cases” for the new skill, but this eval file defines 6 eval entries. Please either add the missing evals (to cover all stated workflows/negative cases) or update the PR description to match what’s actually checked in.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +10
description: >
Investigate performance regressions locally in dotnet/runtime. Use this skill
when asked to investigate a performance regression, bisect to find a culprit
commit, validate a regression with local builds, compare performance between
commits using CoreRun, or benchmark private runtime builds with
BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local
benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking
with @EgorBot or @MihuBot (use the performance-benchmark skill instead).
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill header says not to use this skill for ad hoc PR benchmarking with @EgorBot/@MihuBot and to use performance-benchmark instead, but the PR description + updated triage docs indicate this new skill is meant to consolidate performance investigation guidance (including bot-based workflows) and supersede performance-benchmark. Please align the skill’s scope and cross-skill references (either move bot workflows into this skill, or remove the “superseded” claim / bot references elsewhere).

Copilot uses AI. Check for mistakes.

If the regression is not reproducible locally, check for environment differences
(OS, arch, CPU model) and note this in your report. Consider using
[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide links to egorbot-reference.md, but that file doesn’t exist under .github/skills/performance-investigation/references/ (and no such file exists elsewhere in the repo). Either add the referenced doc or update the link to the correct location (e.g., the relevant skill/doc that documents EgorBot usage).

Suggested change
[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.
@EgorBot to validate on dedicated hardware instead.

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +97
--coreRun /tmp/corerun-good/.../CoreRun \
/tmp/corerun-bad/.../CoreRun
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --coreRun examples use /tmp/corerun-good/.../CoreRun with an ellipsis, but earlier steps copy the whole net{ver}-{os}-Release-{arch} testhost directory to /tmp/corerun-good. It would be clearer to show the concrete relative path from that copied directory to the CoreRun binary so readers can run the command without guesswork.

Suggested change
--coreRun /tmp/corerun-good/.../CoreRun \
/tmp/corerun-bad/.../CoreRun
--coreRun /tmp/corerun-good/shared/Microsoft.NETCore.App/{ver}/CoreRun \
/tmp/corerun-bad/shared/Microsoft.NETCore.App/{ver}/CoreRun

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +37
Use the `performance-investigation` skill (Workflow 2: Regression Investigation)
for the full methodology, which includes:

**Caveat:** If bisect crosses a commit that changes the build infrastructure
(e.g., SDK version bump in `global.json`), the incremental build may fail.
Use exit code `125` (skip) to handle this gracefully.
- Feasibility checks for local vs. bot-based investigation
- Building dotnet/runtime at specific commits and using CoreRun
- Comparing good/bad commits with BenchmarkDotNet
- Git bisect workflow for finding the culprit commit
- Using @EgorBot and @MihuBot for remote validation
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says the performance-investigation skill covers “bot usage” / “remote validation” with @EgorBot and @MihuBot, but the new performance-investigation skill text currently says not to use it for bot-based benchmarking and there are no bot reference docs under that skill. Please make the skill boundaries consistent (either move bot guidance into performance-investigation, or remove bot references here / point to the correct skill).

Copilot uses AI. Check for mistakes.
@EgorBo

This comment was marked as resolved.

This comment was marked as outdated.

@EgorBo
Copy link
Member

EgorBo commented Mar 24, 2026

Oh gosh, the PR/comments mentioned EgorBot, it launched & failed, automatically posted that reply & Copilot filed a PR... 🤦‍♂️ I guess it's time for me to sprinke some AI on my side to just ignore mentions which clearly do not look like requests for benchmarks.

@eiriktsarpalis
Copy link
Member Author

Sorry about that, I guess it helped us discover a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-skills Agent Skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants