Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions assets/contributors.csv
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ Geremy Cohen,Arm,geremyCohen,geremyinanutshell,,
Barbara Corriero,Arm,,,,
Nina Drozd,Arm,NinaARM,ninadrozd,,
Jun He,Arm,JunHe77,jun-he-91969822,,
Henry Wang,Arm,MrXinWang,xin-wang-930b4b141,,
Gian Marco Iodice,Arm,,,,
Aude Vuilliomenet,Arm,,,,
Andrew Kilroy,Arm,,,,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: What is the arm-performix skill?
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## A skill, not a tool

**Arm Performix** is the profiling tool. The **arm-performix skill** is a set of
instructions you add to your AI assistant so that it knows *how* to use Performix
correctly on your behalf: which recipe to pick, how to gather context, how to
read the results, and how to report findings.

Without the skill, an assistant tends to guess at performance problems by reading
your source code. With the skill, it follows a disciplined workflow: measure
first, characterize the bottleneck, change one thing at a time, and prove the win
with before/after data.

## What the skill does for you

When the skill is active, the assistant will:

- Ask for the target, binary path, and workload command before profiling
- Choose the narrowest Performix recipe that answers your question
- Run the recipe (through the `apx` CLI or the Arm MCP Server)
- Return a structured **Analysis Report**: bottleneck summary, key metrics, hot
functions, ranked recommendations, and a single next step

## What the skill will not do

- Profile non-Neoverse Arm cores, such as phone-class SoCs
- Guess at bottlenecks from source reading instead of measurement
- Silently switch to another profiler when Performix is unavailable; it asks you
how to proceed instead

## The recipes it can run

Performix exposes five profiling recipes, and the skill orchestrates them as a
workflow: it picks a starting recipe from your question, then follows the
evidence into whichever further recipes are needed to explain and confirm the
bottleneck. Each recipe answers a different question:

| Your question | Recipe | What it shows |
| --- | --- | --- |
| Where is my time spent? | **Code Hotspots** | Hottest functions, call paths, flame graph |
| Why is the pipeline stalling? | **CPU Microarchitecture** | Frontend/backend stalls, bad speculation, retiring |
| Am I using SIMD (Neon/SVE)? | **Instruction Mix** | Scalar vs vector instruction balance |
| Is memory the bottleneck? | **Memory Access** | L1 hit rate, latency, TLB/page-walk pressure |
| What can the hardware do? | **System Characterization** | Memory bandwidth and latency baseline per NUMA node |

{{% notice Note %}}
**Where the recipes run:**

- **Profiling target** (the machine running your workload). The four
microarchitecture-level recipes (CPU Microarchitecture, Instruction Mix,
Memory Access, and System Characterization) require an **Arm Neoverse** target
on Linux; Memory Access additionally needs the Statistical Profiling Extension
(SPE) enabled. Code Hotspots is broader: it also runs on x86-64 Linux and on
Windows 11 (Arm or x86).
- **Host** (the machine where you run the `apx` CLI or your AI assistant). It can
be macOS, Windows, or Linux on either Arm64 or x86-64, and connects to the
target locally or over SSH.
{{% /notice %}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Install and enable the skill
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Get the skill files

You can get the skill in either of these ways:

- Clone the skills repository from Gitee:

```bash
git clone https://gitee.com/anolis/anolis-skills.git
```

Use this if you prefer managing the skill source with Git.

- Download the skill package from the SkillHub page:
https://skillhub.openanolis.cn/skill/arm-performix

Use this if you prefer downloading a package from the web page. The downloaded
skill is a `.zip` package, so extract it before placing the folder.

## Place the skill files

The skill is a folder that contains `SKILL.md`, `README.md`, and a
`references/` directory. Put it where your assistant discovers skills. For
GitHub Copilot in VS Code, that is the `.github/skills/` directory of your
workspace:

```text
.github/
skills/
arm-performix/
SKILL.md
README.md
references/
<references>.md
```

The `SKILL.md` file describes the profiling workflow; `README.md` provides
supporting overview information; and files under `references/` provide detailed
reference materials that the assistant reads on demand.

## Confirm the skill is discovered

Reload VS Code, then ask your assistant a profiling question (see the next page).
A correctly installed skill is picked up automatically when your request matches
its triggers; you do not invoke it with an explicit command.

{{% notice Tip %}}
The skill only *describes* how to use Performix. You still need Performix itself
available: either the `apx` CLI on your `PATH`, or the Arm MCP Server configured.
The skill tells you when neither is reachable rather than guessing.
{{% /notice %}}

## Choose how Performix runs

The skill can drive Performix two ways. You do not have to pick manually, but it
helps to know which you have set up:

- **apx CLI**: the full-capability path. Install it on your host and confirm it
with `apx version`. Best for remote SSH targets, automation, and CI.
- **Arm MCP Server**: bundles its own `apx`, so your host needs no CLI install.
Best for fully agent-driven workflows. The skill routes here only when you ask,
or when the CLI is not installed and you confirm MCP.
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: Trigger the skill with the right context
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Phrase the request so the skill activates

The skill activates on performance and profiling intent. Phrasings that work:

- "Profile this workload on Arm and find the hotspots."
- "Why is this binary slow on my Arm Neoverse server?"
- "Use Performix to check whether my hot loop is vectorized."
- "Investigate cache and TLB stalls on my Neoverse target."

Phrasings that will *not* activate it, because these are migration or vague questions:

- "Will my code build on Arm?" That is a migration question, not a profiling one.
- "Make my code faster" with no target or binary, which is too vague; add context.

## Provide the context up front

The skill needs the following before it can profile. Supplying them in your first
message avoids a round of back-and-forth:

1. **Target**: a local Arm machine, or `user@host` for a remote SSH target
2. **Binary**: the **absolute path** to the executable on the target
3. **Workload**: the exact command and arguments, ideally repeatable
4. **Goal**: hotspots, SIMD usage, memory locality, or a regression to chase

If it needs anything else, such as the source tree for line-level attribution or
your build flags, the skill is designed to ask for it rather than guess.

A good first prompt looks like this:

```text
Profile /home/me/build/myapp --input bench.dat on my Arm Neoverse target
me@neoverse-box with Performix. I want to know where the time goes.
```

The skill picks **Code Hotspots** first, runs it, and reports back with the
analysis report described on the next page.

{{% notice Note %}}
Always give the **absolute path** to the binary, and use absolute paths for any
output files your workload writes. Performix may launch the process from a
temporary working directory, so relative paths can resolve unexpectedly.
{{% /notice %}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
title: Read the report and drive the optimization loop
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## The analysis report

After every recipe run, the skill returns a structured report instead of raw
data. Expect these sections:

- **Bottleneck Summary**: what dominates, and how confident the skill is
- **Key Metrics**: the three to five most decision-relevant numbers
- **Hot Functions**: ranked, each with a brief root-cause note
- **Recommended Actions**: concrete, prioritized fixes (file, function, line)
- **Ruled Out**: hypotheses the data did *not* support, and why
- **Next Step**: a single actionable instruction to run next, such as the next
recipe to try or one code change to make, then re-profile

This report is the skill's primary deliverable. If the data is noisy or
insufficient, the skill says so plainly and recommends a re-run rather than
guessing.

## A worked example

For a compute-bound workload, a first Code Hotspots report from the skill looks
like this:

```markdown
## Performix Analysis Report

**Recipe:** Code Hotspots
**Target:** neoverse-box (Arm Neoverse, 64 cores)
**Workload:** /home/me/build/myapp --input bench.dat

### Bottleneck Summary

The escape-check loop in `escape_iterations` dominates CPU time (72% of samples),
driven by an avoidable `sqrt` call inside the tight iteration. High confidence:
a single function, stable across runs.

### Key Metrics

| Metric | Value | Assessment |
|--------|-------|------------|
| escape_iterations self % | 72.3% | Critical: single dominant hotspot |
| sqrt self % | 45.1% | Critical: unnecessary math |
| Total samples | 48,201 | Good: enough for reliable data |

### Hot Functions

| # | Function | Samples (%) | Root Cause |
|---|----------|-------------|------------|
| 1 | escape_iterations | 72.3% | sqrt in inner loop |
| 2 | sqrt | 45.1% | magnitude check in escape_iterations |

### Recommended Actions (priority order)

1. **Remove sqrt** in `mandelbrot.c:22`: replace `sqrt(zr2 + zi2) > 2.0` with
`(zr2 + zi2) > 4.0`.

### Ruled Out

- Memory locality is not the issue: the hotspot is purely scalar FP compute.

### Next Step

Rebuild with the sqrt removal, re-run Code Hotspots, and confirm
escape_iterations drops below 30%.
```

Notice the report names a file and line, ranks the cost by measured samples, and
ends with a single action you can take immediately, rather than a wall of raw
counters.

## Expect a two-pass investigation

The skill does not let a single Code Hotspots run justify "this is as fast as it
gets." Code Hotspots shows *where* time goes, never *why*. Expect the skill to
propose a **second pass** with a characterizing recipe (CPU Microarchitecture,
Instruction Mix, or Memory Access) to explain why the hot spot is hot. Let it
run that pass before deciding a cost is irreducible.

## Drive the optimization loop

Work with the skill one change at a time:

1. It establishes a **baseline** run.
2. You, or the skill, make **one** focused change.
3. It **re-profiles** with the same recipe and workload.
4. It reports a **before/after comparison**, a measurement, not a claim.
5. It looks for the **next bottleneck**, or summarizes the remaining trade-offs.

If you want to stop, ask for the remaining opportunities. The skill is designed to
hand you measured options with their trade-offs rather than declare the work
finished on its own.

{{% notice Tip %}}
You can ask the skill to export a run with `apx run export` so you can share it
with a teammate, or re-render its results as JSON with the `--json` flag for
machine-readable output.
{{% /notice %}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: Profile and optimize Arm workloads with the arm-performix agent skill
description: Learn how to install and use the arm-performix skill so an AI coding assistant can drive Arm Performix for you to find code hotspots, diagnose pipeline stalls, and propose measured optimizations on Arm Neoverse.

minutes_to_complete: 30

who_is_this_for: This is an introductory topic for developers who use an AI coding assistant (such as GitHub Copilot in VS Code) and want it to drive Arm Performix on their behalf to profile and optimise software performance through the arm-performix skill, without having to memorize the apx CLI themselves.

learning_objectives:
- Install and enable the arm-performix skill in your AI assistant
- Trigger the skill with phrasing that activates the profiling workflow
- Provide the context the skill needs to profile (target, binary, workload)
- Read the analysis report the skill produces and drive the optimization loop

prerequisites:
- An AI assistant that supports skills, such as GitHub Copilot in VS Code
- An Arm Neoverse-based instance reachable from the assistant's environment
- Arm Performix (the `apx` CLI) installed, or the Arm MCP Server configured

author:
- Henry Wang

### Tags
skilllevels: Introductory
subjects: Performance and Architecture
armips:
- Neoverse
tools_software_languages:
- Arm Performix
- GitHub Copilot
- MCP
operatingsystems:
- Linux
- macOS
- Windows

further_reading:
- resource:
title: Arm Performix product page
link: https://developer.arm.com/Tools%20and%20Software/Arm%20Performix
type: website
- resource:
title: Find Code Hotspots with Arm Performix
link: /learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/
type: learning-path
- resource:
title: Optimize application performance using Arm Performix CPU microarchitecture analysis
link: /learning-paths/servers-and-cloud-computing/performix-microarchitecture/
type: learning-path
- resource:
title: Optimize memory access behavior using Arm Performix and the Arm MCP Server
link: /learning-paths/servers-and-cloud-computing/performix-memory-access/
type: learning-path
- resource:
title: Migrate applications to Arm servers using migrate-ease
link: /learning-paths/servers-and-cloud-computing/migrate-ease/
type: learning-path
- resource:
title: Automate x86-to-Arm application migration using Arm MCP Server
link: /learning-paths/servers-and-cloud-computing/arm-mcp-server/
type: learning-path
- resource:
title: Get started with Servers and Cloud Computing
link: /learning-paths/servers-and-cloud-computing/intro/
type: learning-path
- resource:
title: Learn about Arm Neoverse processors
link: https://www.arm.com/products/silicon-ip-cpu/neoverse
type: website



### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # The weight controls the order of the pages. _index.md always has weight 1.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---