Synchronize no-GPU cache eviction with CPU streams by dhiltgen · Pull Request #3566 · ml-explore/mlx

dhiltgen · 2026-05-19T17:21:31Z

Proposed changes

Follow up to #3554 - while testing variations on my AVX2 branch I found this bug on that change.

The no-GPU CPU allocator cache could release cached buffers while asynchronous CPU work was still queued on a non-default stream. mlx-lm generation can hit this by queuing the next token on a CPU stream and calling clear_cache between yielded tokens, which can leave queued work referencing freed cached memory.

Synchronize CPU streams before evicting cached buffers from clear_cache or set_cache_limit, and protect get_cache_memory with the allocator mutex. Add a regression test that verifies clear_cache waits for queued CPU stream work before returning.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

The no-GPU CPU allocator cache could release cached buffers while asynchronous CPU work was still queued on a non-default stream. mlx-lm generation can hit this by queuing the next token on a CPU stream and calling clear_cache between yielded tokens, which can leave queued work referencing freed cached memory. Synchronize CPU streams before evicting cached buffers from clear_cache or set_cache_limit, and protect get_cache_memory with the allocator mutex. Add a regression test that verifies clear_cache waits for queued CPU stream work before returning.

angeloskath

Looks great and makes total sense, thanks!

zcbenz

Nice fix!

angeloskath approved these changes May 19, 2026

View reviewed changes

zcbenz approved these changes May 20, 2026

View reviewed changes

zcbenz merged commit e0163f3 into ml-explore:main May 20, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize no-GPU cache eviction with CPU streams#3566

Synchronize no-GPU cache eviction with CPU streams#3566
zcbenz merged 1 commit into
ml-explore:mainfrom
dhiltgen:allocator-cache-fix

dhiltgen commented May 19, 2026

Uh oh!

angeloskath left a comment

Uh oh!

zcbenz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dhiltgen commented May 19, 2026

Proposed changes

Checklist

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants