-
Notifications
You must be signed in to change notification settings - Fork 4
fix(optim): untie batched constant MatMul for OpenVINO GPU #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
9b42f68
fix(optim): untie batched constant MatMul for OpenVINO GPU
b82c105
Merge remote-tracking branch 'origin/main' into hualxie/fix_ov_gpu
5d04ce0
update
d9f5ca7
use EPName
9dbfb6f
sort
1712a3f
Merge remote-tracking branch 'origin/main' into hualxie/fix_ov_gpu
79bcc3f
fix(optim): address review comments for untie batched constant MatMul
6411f29
Merge remote-tracking branch 'origin/main' into hualxie/fix_ov_gpu
xieofxie 06a04c9
remove needs_context
xieofxie 486bba8
always set
xieofxie 17993b0
Merge remote-tracking branch 'origin/main' into hualxie/fix_ov_gpu
xieofxie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
123 changes: 123 additions & 0 deletions
123
src/winml/modelkit/analyze/core/model_validators/batched_const_matmul_validator.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| # ------------------------------------------------------------------------- | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
| # -------------------------------------------------------------------------- | ||
| """Validator for batched MatMul with a constant operand on OpenVINO GPU. | ||
|
|
||
| OpenVINO GPU's oneDNN gemm cannot select an implementation for a batched | ||
| (rank >= 3) MatMul where an operand is a compile-time constant. The identical | ||
| gemm with a dynamic operand, and 2D constant gemm, both compile fine. Models | ||
| whose batched MatMul weights fold to constants (e.g. transformer disentangled | ||
| attention position terms) therefore fail to compile on OpenVINO GPU with: | ||
|
|
||
| [GPU] Failed to select implementation for ... type: gemm | ||
|
|
||
| This validator detects that structural pattern and recommends the | ||
| ``untie-constant-batched-matmul`` surgery, which makes the constant operand | ||
| runtime-valued so gemm implementation selection succeeds. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
|
|
||
| from ...models.information import Action, ActionItem, ActionLevel, Information | ||
| from ...utils import infer_ihv_from_ep_name | ||
| from .base import ModelValidator | ||
|
|
||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Surgery capability enabled when the pattern is detected (kebab-case to match | ||
| # the capability registry / autoconf normalization). | ||
| _SURGERY_FLAG = "untie-constant-batched-matmul" | ||
|
|
||
|
|
||
| class BatchedConstMatMulValidator(ModelValidator): | ||
| """Detect batched MatMul with a constant operand (OpenVINO GPU only).""" | ||
|
|
||
| @property | ||
| def validator_name(self) -> str: | ||
| """Name of this validator for logging/reporting.""" | ||
| return "BatchedConstMatMulValidator" | ||
|
|
||
| @property | ||
| def pattern_id(self) -> str: | ||
| """Pattern ID for Information objects.""" | ||
| return "MODEL/BatchedConstantMatMul" | ||
|
|
||
| def _is_enabled(self) -> bool: | ||
| """Only relevant for OpenVINO (Intel IHV) on GPU.""" | ||
| if (self.device or "").upper() != "GPU": | ||
| return False | ||
| ep = self.ep | ||
| if not ep: | ||
| return False | ||
| try: | ||
| from ...models.ihv_type import IHVType | ||
|
|
||
| return infer_ihv_from_ep_name(ep) == IHVType.INTEL | ||
| except Exception: # pragma: no cover - defensive | ||
| return False | ||
|
|
||
| def validate(self) -> Information | None: | ||
| """Detect batched MatMul with a single constant rank>=3 operand.""" | ||
| if not self._is_enabled(): | ||
| return None | ||
|
|
||
| # Known gap: constants expressed as `Constant` op nodes (rather than | ||
| # graph initializers) are not detected here. The `untie-constant-batched | ||
| # -matmul` surgery in surgery.py has the same limitation, so detection | ||
| # and surgery stay consistent. Most exporters and ORT preprocessing emit | ||
| # weights as initializers, so this covers the disentangled-attention case | ||
| # in practice; `Constant`-node weights would need handling on both sides. | ||
| initializers = {init.name for init in self.graph.initializer} | ||
| rank_by_init = {init.name: len(init.dims) for init in self.graph.initializer} | ||
|
|
||
| offenders: list[str] = [] | ||
| for node in self.graph.node: | ||
| if node.op_type != "MatMul" or len(node.input) != 2: | ||
| continue | ||
| const_inputs = [name for name in node.input if name in initializers] | ||
| # Exactly one constant operand (two-constant MatMuls fold away and | ||
| # never reach gemm impl selection). | ||
| if len(const_inputs) != 1: | ||
| continue | ||
| if rank_by_init.get(const_inputs[0], 0) >= 3: | ||
| offenders.append(node.name or const_inputs[0]) | ||
|
|
||
| if not offenders: | ||
| return None | ||
|
|
||
| examples = ", ".join(offenders[:3]) | ||
| action = Action( | ||
| pattern_from_id="", | ||
| pattern_to_id="", | ||
| level=ActionLevel.REQUIRED, | ||
| status=None, | ||
| action_items=[ | ||
| ActionItem(type="GraphOptimization", optimization_options={_SURGERY_FLAG: True}) | ||
| ], | ||
| details=( | ||
| "Enable untie-constant-batched-matmul surgery so the constant " | ||
| "operand becomes runtime-valued and OpenVINO GPU can select a " | ||
| "gemm implementation." | ||
| ), | ||
| ) | ||
| # https://github.com/openvinotoolkit/openvino/issues/36272 | ||
| explanation = ( | ||
| f"Model contains {len(offenders)} batched MatMul(s) with a constant " | ||
| f"operand (examples: {examples}). OpenVINO GPU's oneDNN gemm cannot " | ||
| f"select an implementation for a batched MatMul with a constant " | ||
| f"operand, causing a '[GPU] Failed to select implementation ... gemm' " | ||
| f"compile failure. The untie-constant-batched-matmul surgery makes " | ||
| f"the operand runtime-valued without changing numerics. " | ||
| f"It is fixed in openvino==2026.2.0, so no need to apply the surgery " | ||
| f"if using that version or later." | ||
| ) | ||
| return Information( | ||
| explanation=explanation, | ||
| actions=[action], | ||
| pattern_id=self.pattern_id, | ||
| status=None, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.