gh-149146: Add dealloc-depth counter fallback to trashcan trigger#151595
Open
bhuvi27 wants to merge 1 commit into
Open
gh-149146: Add dealloc-depth counter fallback to trashcan trigger#151595bhuvi27 wants to merge 1 commit into
bhuvi27 wants to merge 1 commit into
Conversation
Under memory pressure (for example RLIMIT_AS), Python's trashcan trigger in _Py_Dealloc never fires. The trigger relies on _Py_RecursionLimit_GetMargin, which compares the machine stack pointer against c_stack_soft_limit. When RLIMIT_AS prevents the kernel from growing the C stack, the kernel SIGSEGVs while the stack pointer is still megabytes above the soft limit (see sibling issue pythongh-150722 for an LLDB trace showing SP ~7.2 MB above c_stack_hard_limit at the SIGSEGV), so the trashcan never deposits and the recursive tuple_dealloc chain runs out the stack. Add a per-thread c_dealloc_depth counter as a fallback trigger, mirroring the historical _PyTrash_UNWIND_LEVEL=50 protection that was removed when the trashcan was consolidated into _Py_Dealloc. _Py_RecursionLimit_GetMargin remains the primary signal; the counter only kicks in when the stack-pointer signal cannot fire. _PyTrash_thread_destroy_chain itself bumps c_dealloc_depth for the duration of the drain so any _Py_Dealloc invoked while draining cannot recursively re-enter destroy_chain (which would rebuild the same unbounded recursion the trashcan exists to prevent). This mirrors the historical delete_nesting bookkeeping. The counter is per-tstate (no atomics needed in the free-threaded build) and is only touched for GC types, so non-GC types pay zero overhead. Add a regression test in test_gc that builds a 100,000-deep (b, None) tuple chain and asserts that ``del b`` cleans it up without a C-stack overflow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #149146.
while True: b = (b, None)under RLIMIT_AS segfaults during cleanup:Docker with
ulimit -v 524288:MemoryError→lost sys.stderr→ segfault, exit 139.The trashcan deposit in
_Py_Dealloconly fires when_Py_RecursionLimit_GetMargin(tstate) < 2, which compares the stack pointer againstc_stack_soft_limit. Under RLIMIT_AS the kernel SIGSEGVs trying to grow the stack while SP is still nowhere near the soft limit — issue #150722 has an LLDB trace from the same reporter showing SP ~7 MB abovec_stack_hard_limitat the fault. So the trashcan never deposits and the recursivetuple_dealloc → Py_XDECREF → _Py_Dealloc → tuple_deallocchain just runs into the wall.This adds a per-thread
c_dealloc_depthcounter and uses it as a secondary trigger inside_Py_Dealloc, depositing at depth 50 (matches the old_PyTrash_UNWIND_LEVEL). The stack-pointer margin stays the primary signal; the counter only kicks in when SP can't.One thing I missed on the first pass: the drain side has the same problem.
margin >= 4is also true in this scenario, so every_Py_Deallocexit would re-enter_PyTrash_thread_destroy_chainfrom inside the deallocator it just called. Fix:destroy_chainbumps the counter for the duration of the drain (same idea as the olddelete_nesting), and the drain check in_Py_Deallocbecomes justc_dealloc_depth == 0. Counter is only touched for GC types so non-GC types pay nothing.Tests: new
test_trashcan_nested_tuple_deepintest_gc.py(100k nested tuples, must clean up without crash);test_gc test_exceptions test_capiall green locally. Docker repro now exits 1 withMemoryErrorinstead of 139 withSegmentation fault.#150722 is the same SP-signal blindness in the comparison path (
Py_EnterRecursiveCall) — separate code path, separate PR.