GEMM+GEMM and CONV+GEMM support to quickTuningGen and GEMM+GEMM quick tuning list#2262
GEMM+GEMM and CONV+GEMM support to quickTuningGen and GEMM+GEMM quick tuning list#2262dorde-antic wants to merge 13 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds two groups of changes: (1) updates the rocprofv3 profiler invocation in perfRunner.py and tuningRunner.py to use the correct --output-format csv flag (replacing the old -f csv flag), and (2) extends quickTuningGen.py to handle gemm_gemm and conv_gemm operations, and adds the corresponding GEMM+GEMM quick tuning parameter arrays for gfx908 (f16, f32) and gfx1200 (f16) architectures to QuickTuningPerfconfigs.inc.
Changes:
- Updated rocprofv3 flag from
-f csvto--output-format csvacross performance runner scripts - Added GEMM+GEMM and CONV+GEMM operation support in the
quickTuningGen.pycode generator - Added GEMM+GEMM quick tuning parameter lists for
gfx908andgfx1200to the.incfile
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
mlir/utils/performance/perfRunner.py |
Updates rocprofv3 --output-format csv flag in two profiler invocations |
mlir/utils/performance/tuningRunner.py |
Same rocprofv3 flag update in verification pipeline |
mlir/utils/performance/analysis/quickTuningGen.py |
Adds column definitions and full code-generator support for gemm_gemm and conv_gemm ops |
mlir/include/mlir/Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc |
Adds GEMM+GEMM quick tuning parameter arrays and lookup entries for gfx908 (f16, f32) and gfx1200 (f16) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@umangyadav @mirza-halilcevic The reason is to avoid potential conflicts for perfRunner, tuningRunner and quickTuningGen which were modified in this PR long time ago? Or atleast to move these changes (in perf scripts) to separate PR and keep .inc file in this one. |
| if op == "gemm_gemm" and arch.startswith("gfx1") and dtype == "f32": | ||
| return "NonAccel" |
There was a problem hiding this comment.
Navi4x/gfx12, doesn't it require nonaccel as well ?
| # Match -t f32 in the test vector (e.g. "-t f32 -transA" or " -t f32 ") | ||
| return '-t f32' in test_vector |
There was a problem hiding this comment.
Doesn't this work for GemmGemm problem configs as well ?
|
|
||
| {"gfx1152_attention_i8", {PopulateParamsGemmGemm::initParametersI8AttentionGfx1152, PopulateParamsGemmGemm::nInitParametersI8AttentionGfx1152}}, | ||
|
|
||
| {"gfx908_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx908}}, |
There was a problem hiding this comment.
Do we have i8 and bf16 gemm+gemm configs?
| if op == "gemm_gemm" and arch.startswith("gfx1") and dtype == "f32": | ||
| return "NonAccel" |
There was a problem hiding this comment.
f32 on Navi for "GemmGemm" instruction types is unsupported. Not sure if it's appropriate to return NonAccel here.
| {"gfx908_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx908}}, | ||
|
|
||
| {"gfx908_gemmelementwisegemm_f32", {PopulateParamsGemmGemm::initParametersF32GemmGemmGfx908, PopulateParamsGemmGemm::nInitParametersF32GemmGemmGfx908}}, | ||
|
|
||
| {"gfx1200_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx1200, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx1200}}, | ||
|
|
||
| {"gfx1100_gemmelementwisegemm_f16", {PopulateParamsGemmGemm::initParametersF16GemmGemmGfx1100, PopulateParamsGemmGemm::nInitParametersF16GemmGemmGfx1100}}, | ||
|
|
There was a problem hiding this comment.
Do we have bf16 and i8 gemm+gemm configs?
| %(prog)s gemmgemm/*.debug --op gemm_gemm --update | ||
| %(prog)s convgemm/*.debug --op conv_gemm --update |
There was a problem hiding this comment.
Not necessary to put this in examples.
| "\n".join(dec_lines)) | ||
|
|
||
| # Add lookup entry | ||
| # Add lookup entry (key must match C++ ParamLookupTable makeKey: arch_op_dtype) |
There was a problem hiding this comment.
Let's not mention C++ methods in comments because they are bound to change.
| key_map = { | ||
| "attention": "attention", | ||
| "gemm_gemm": "gemmelementwisegemm", | ||
| "conv_gemm": "convelementwisegemm" | ||
| } |
There was a problem hiding this comment.
What about conv and gemm?
|
|
||
| def _is_navi_arch(arch: str) -> bool: | ||
| """Return True if arch is Navi (gfx11xx or gfx12xx).""" | ||
| return arch.startswith("gfx11") or arch.startswith("gfx12") |
There was a problem hiding this comment.
Maybe just arch.startwsiwth("gfx1")
|
|
||
| state_file.set_running(test_vector) | ||
|
|
||
| if _should_skip_f32_on_navi(ctx.options.chip, test_vector, ctx.conf_class): |
There was a problem hiding this comment.
It would be better to filter out these configs in the beginning and print something like "Skipping N unsupported configs". It would simplify the rest of the code.
Motivation
Technical Details
Test Plan
Quick tuning locally
tuningRunner and perfRunner in general
CI
Test Result
Quick tuning locally ✅
PR CI
Weekly CI
Nightly CI
Submission Checklist