Skip to content

metal: fix flash attention nsg8 dv64 path#3790

Open
adavyas wants to merge 1 commit intoggml-org:masterfrom
adavyas:metal-fa-nsg8-dv64-fix
Open

metal: fix flash attention nsg8 dv64 path#3790
adavyas wants to merge 1 commit intoggml-org:masterfrom
adavyas:metal-fa-nsg8-dv64-fix

Conversation

@adavyas
Copy link
Copy Markdown

@adavyas adavyas commented May 5, 2026

  • Fix Metal FlashAttention single-output-tile handling. Adds an explicit NO == 1 path so the kernel loads one value tile and accumulates into lo[0] instead of skipping the multiply-accumulate because NO/2 == 0.
  • Preserve the existing paired-tile fast path. Keeps the current two-tile loop for NO > 1, while adding a safe remainder/small-shape path for tensor shapes that only produce one output tile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant