-
-
Notifications
You must be signed in to change notification settings - Fork 747
fix(allocator): fix potential deadlock in FixedSizeAllocatorPool
#17112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(allocator): fix potential deadlock in FixedSizeAllocatorPool
#17112
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a race condition in FixedSizeAllocatorPool that could cause deadlock when multiple threads are waiting for allocators from a pool that has reached its capacity limit.
Key Changes:
- Modified the waiting logic in
get()to acquire the mutex lock before entering the wait loop, preventing a lost wakeup scenario - Renamed
allocatorsvariables toallocators_guardfor clarity - Enhanced comments to explain the deadlock prevention mechanism
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CodSpeed Performance ReportMerging #17112 will not alter performanceComparing Summary
Footnotes
|
Merge activity
|
662d5d6 to
b87600a
Compare
…17094) Modification of fixed-size allocator limits, building on #17023. ### The problem This is an alternative design, intended to handle one flaw on Windows: Each allocator is 4 GiB in size, so if system has 16.01 GiB of memory available, we could succeed in creating 4 x 4 GiB allocators, but that'd only leave 10 MiB of memory free. Likely then some other allocation (e.g. creating a normal `Allocator`, or even allocating a heap `String`) would fail due to OOM later on. Note that "memory available" on Windows does not mean "how much RAM the system has". It includes the swap file, the size of which depends on how much free disk space the system has. So numbers like 16.01 GiB are not at all out of the question. ### Proposed solution On Windows, create as many allocators as possible when creating the pool, up to `thread count + 1`. Then return the last allocator back to the system. This ensures that there's at least 4 GiB of memory free for other allocations, which should be enough. ### Redesign In working through the various scenarios, I realized that the implementation can be simplified for both Linux/Mac and Windows. In both cases, no more than `thread_count` fixed-size allocators can be in use at any given time - see doc comment on `FixedSizeAllocatorPool` for full explanation. So create the pool with `thread_count` allocators (or as close as we can get on Windows). Thereafter the pool does not need to grow, and cannot. This allows removing a bunch of synchronization code. * On Linux/Mac, #17013 solved the too-many-allocators problem another way, so all we need is the `Mutex`. * On Windows, we only need a `Mutex` + a `Condvar`. In both cases, it's much simplified, which makes it much less likely for subtle race conditions like #17112 to creep in. Removing the additional synchronization should also be a little more performant. Note that the redesign is not the main motivator for this change - preventing OOM on Windows is.

#17023 introduced a queuing system to limit the number of
FixedSizeAllocators in play at any given time. However, there was a subtle race condition, which could result in deadlock. Fix it.See comments in the code for explanation.