GEMM+GEMM and CONV+GEMM support to quickTuningGen and GEMM+GEMM quick tuning list by dorde-antic · Pull Request #2262 · ROCm/rocMLIR

dorde-antic · 2026-03-01T15:33:03Z

Motivation

AIROCMLIR-71 ⌛(waiting for access to some machines to complete it)
- gfx908 ✅
- gfx1200 ✅
- gfx1100 ✅
- gfx90a ⌛
- gfx942 ⌛
- gfx950 ⌛
AIROCMLIR-72 ✅
AIROCMLIR-198 ✅

Technical Details

Test Plan

Quick tuning locally
tuningRunner and perfRunner in general
CI

Test Result

Quick tuning locally ✅
PR CI
Weekly CI
Nightly CI

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds two groups of changes: (1) updates the rocprofv3 profiler invocation in perfRunner.py and tuningRunner.py to use the correct --output-format csv flag (replacing the old -f csv flag), and (2) extends quickTuningGen.py to handle gemm_gemm and conv_gemm operations, and adds the corresponding GEMM+GEMM quick tuning parameter arrays for gfx908 (f16, f32) and gfx1200 (f16) architectures to QuickTuningPerfconfigs.inc.

Changes:

Updated rocprofv3 flag from -f csv to --output-format csv across performance runner scripts
Added GEMM+GEMM and CONV+GEMM operation support in the quickTuningGen.py code generator
Added GEMM+GEMM quick tuning parameter lists for gfx908 and gfx1200 to the .inc file

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`mlir/utils/performance/perfRunner.py`	Updates rocprofv3 `--output-format csv` flag in two profiler invocations
`mlir/utils/performance/tuningRunner.py`	Same rocprofv3 flag update in verification pipeline
`mlir/utils/performance/analysis/quickTuningGen.py`	Adds column definitions and full code-generator support for `gemm_gemm` and `conv_gemm` ops
`mlir/include/mlir/Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc`	Adds GEMM+GEMM quick tuning parameter arrays and lookup entries for gfx908 (f16, f32) and gfx1200 (f16)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…2xx as skipped instead of failed

…uning-gen

dorde-antic · 2026-03-24T14:34:43Z

@umangyadav @mirza-halilcevic
Can we merge this and make a follow up for:
gfx90a ⌛
gfx942 ⌛
gfx950 ⌛

The reason is to avoid potential conflicts for perfRunner, tuningRunner and quickTuningGen which were modified in this PR long time ago?

Or atleast to move these changes (in perf scripts) to separate PR and keep .inc file in this one.

…uning-gen

umangyadav · 2026-03-25T23:24:03Z

+        if op == "gemm_gemm" and arch.startswith("gfx1") and dtype == "f32":
+            return "NonAccel"


Navi4x/gfx12, doesn't it require nonaccel as well ?

umangyadav · 2026-03-25T23:26:03Z

+    # Match -t f32 in the test vector (e.g. "-t f32 -transA" or " -t f32 ")
+    return '-t f32' in test_vector


Doesn't this work for GemmGemm problem configs as well ?

mirza-halilcevic · 2026-03-26T12:48:26Z


 {"gfx1152_attention_i8", {PopulateParamsGemmGemm::initParametersI8AttentionGfx1152, PopulateParamsGemmGemm::nInitParametersI8AttentionGfx1152}},

+{"gfx908_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx908}},


Do we have i8 and bf16 gemm+gemm configs?

mirza-halilcevic · 2026-03-26T12:52:56Z

+        if op == "gemm_gemm" and arch.startswith("gfx1") and dtype == "f32":
+            return "NonAccel"


f32 on Navi for "GemmGemm" instruction types is unsupported. Not sure if it's appropriate to return NonAccel here.

mirza-halilcevic · 2026-03-26T12:53:31Z

+{"gfx908_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx908}},
+
+{"gfx908_gemmelementwisegemm_f32", {PopulateParamsGemmGemm::initParametersF32GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF32GemmGemmGfx908}},
+
+{"gfx1200_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx1200, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx1200}},
+
+{"gfx1100_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx1100, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx1100}},
+


Do we have bf16 and i8 gemm+gemm configs?

mirza-halilcevic · 2026-03-26T12:57:58Z

+    %(prog)s gemmgemm/*.debug --op gemm_gemm --update
+    %(prog)s convgemm/*.debug --op conv_gemm --update


Not necessary to put this in examples.

mirza-halilcevic · 2026-03-26T12:58:51Z

                                  "\n".join(dec_lines))

-        # Add lookup entry
+        # Add lookup entry (key must match C++ ParamLookupTable makeKey: arch_op_dtype)


Let's not mention C++ methods in comments because they are bound to change.

mirza-halilcevic · 2026-03-26T13:01:33Z

+    key_map = {
+        "attention": "attention",
+        "gemm_gemm": "gemmelementwisegemm",
+        "conv_gemm": "convelementwisegemm"
+    }


What about conv and gemm?

mirza-halilcevic · 2026-03-26T13:08:22Z


+def _is_navi_arch(arch: str) -> bool:
+    """Return True if arch is Navi (gfx11xx or gfx12xx)."""
+    return arch.startswith("gfx11") or arch.startswith("gfx12")


Maybe just arch.startwsiwth("gfx1")

mirza-halilcevic · 2026-03-26T13:14:02Z


                state_file.set_running(test_vector)

+                if _should_skip_f32_on_navi(ctx.options.chip, test_vector, ctx.conf_class):


It would be better to filter out these configs in the beginning and print something like "Skipping N unsupported configs". It would simplify the rest of the code.

dorde-antic added 4 commits March 1, 2026 10:06

Add GEMM+GEMM and CONV+GEMM support to quickTuningGen.py

902e56b

Add GEMM+GEMM quick tuning configs for gfx908

25e21c1

Add GEMM+GEMM quick tuning configs for gfx1200

b65f493

Fix rocprof verification: use --output-format csv instead of -f csv

64cab37

dorde-antic requested review from Copilot and mirza-halilcevic March 1, 2026 15:33

Copilot started reviewing on behalf of dorde-antic March 1, 2026 15:35 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

dorde-antic added 6 commits March 1, 2026 15:49

Add GEMM+GEMM quick tuning configs for gfx1100

892f6d9

Treat gemm_gemm, conv_gemm, and attention with -t f32 on gfx11xx/gfx1…

10ba5ef

…2xx as skipped instead of failed

Edit log message

e009729

Apply yapf

6efb66d

Apply yapf pt 2

5f35955

Apply yapf on perfRunner as well

68a466e

dorde-antic requested a review from umangyadav March 2, 2026 12:00

Merge branch 'develop' into AIROCMLIR-72-gemm-gemm-support-in-quick-t…

5b0a41e

…uning-gen

dorde-antic marked this pull request as ready for review March 11, 2026 15:21

dorde-antic requested a review from causten as a code owner March 11, 2026 15:21

Merge branch 'develop' into AIROCMLIR-72-gemm-gemm-support-in-quick-t…

4d2a00f

…uning-gen

Merge branch 'develop' into AIROCMLIR-72-gemm-gemm-support-in-quick-t…

6be3f3c

…uning-gen

umangyadav reviewed Mar 25, 2026

View reviewed changes

mirza-halilcevic reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEMM+GEMM and CONV+GEMM support to quickTuningGen and GEMM+GEMM quick tuning list#2262

GEMM+GEMM and CONV+GEMM support to quickTuningGen and GEMM+GEMM quick tuning list#2262
dorde-antic wants to merge 13 commits intodevelopfrom
AIROCMLIR-72-gemm-gemm-support-in-quick-tuning-gen

dorde-antic commented Mar 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

dorde-antic commented Mar 24, 2026 •

edited

Loading

Uh oh!

umangyadav Mar 25, 2026

Uh oh!

umangyadav Mar 25, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

mirza-halilcevic Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if op == "gemm_gemm" and arch.startswith("gfx1") and dtype == "f32":
		return "NonAccel"

		# Match -t f32 in the test vector (e.g. "-t f32 -transA" or " -t f32 ")
		return '-t f32' in test_vector


		{"gfx1152_attention_i8", {PopulateParamsGemmGemm::initParametersI8AttentionGfx1152, PopulateParamsGemmGemm::nInitParametersI8AttentionGfx1152}},

		{"gfx908_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx908}},

		%(prog)s gemmgemm/*.debug --op gemm_gemm --update
		%(prog)s convgemm/*.debug --op conv_gemm --update


		state_file.set_running(test_vector)

		if _should_skip_f32_on_navi(ctx.options.chip, test_vector, ctx.conf_class):

Conversation

dorde-antic commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

dorde-antic commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dorde-antic commented Mar 1, 2026 •

edited

Loading

dorde-antic commented Mar 24, 2026 •

edited

Loading