FFT: fix grid and mini-batch by AntonOresten · Pull Request #87 · NVIDIA/cutile-python

AntonOresten · 2026-05-25T22:25:45Z

Description

The FFT sample currently passes the full batch as the kernel's BS constant while also launching a grid of (BS, 1, 1) blocks. Each block then loads a (BS, …) tile, so every block redundantly processes the whole batch. The cost scales with batch and spills hard at modest sizes.

This PR re-interprets the kernel's BS constant as a per-block minibatch size and sizes the grid as Batch // BS accordingly. The wrapper exposes it as a minibatch: int = 1 parameter; the kernel and its internal tile shapes are unchanged.

Measured on a DGX Spark, N=512, batch=64, factors=(8,8,8), minibatch=1: kernel launch time goes from 2376 μs -> 12 μs (~200x), with twiddle factors precomputed (as they would be in any real use).

Sweep of minibatch ∈ {1, 2, 4} are all correct against torch.fft.fft, with no consistent win for minibatch > 1 at these problem sizes (registers/shared mem fill quickly). Thus, BS / minibatch could, and maybe should be dropped entirely.

x-ref JuliaGPU/cuTile.jl#232

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: AntonOresten <antonoresten@proton.me>

FFT: fix grid and mini-batch

f0155fe

Signed-off-by: AntonOresten <antonoresten@proton.me>

AntonOresten force-pushed the fft-batch branch from 366d975 to f0155fe Compare May 25, 2026 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT: fix grid and mini-batch#87

FFT: fix grid and mini-batch#87
AntonOresten wants to merge 1 commit into
NVIDIA:mainfrom
AntonOresten:fft-batch

AntonOresten commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AntonOresten commented May 25, 2026

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant