Skip to content

Conversation

@nataliakokoromyti
Copy link
Collaborator

Bump CUDA to 13.0.0 and fix baseline timing. KernelBench PR #127 removed baseline times from metadata, so now we measure it ourselves inside Modal's evaluate() by calling measure_ref_program_time(). This measures the PyTorch reference in the same container with the same precision/device as the generated kernel. In the case of check_for_excessive_speedup=True, we reuse result.ref_runtime to avoid measuring twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants