Fix OOM crash (exit 137) in System.Runtime.Tests OuterLoop on Linux Debug/Checked CoreCLR#126013
Fix OOM crash (exit 137) in System.Runtime.Tests OuterLoop on Linux Debug/Checked CoreCLR#126013
Conversation
Extract the ~2 GB max-size byte array allocation from AllocateUninitializedArray and AllocateArray OuterLoop tests into dedicated ConditionalFact methods that are skipped on non-64-bit and non-Release runtime builds. This prevents the Linux OOM killer (SIGKILL, exit code 137) from terminating the test process on Debug and Checked CoreCLR builds where runtime overhead is higher. Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/de35aa82-798e-4a8a-805c-ad6599aab895
|
Tagging subscribers to this area: @dotnet/area-system-runtime |
There was a problem hiding this comment.
Pull request overview
This PR addresses Linux OuterLoop OOM terminations (exit code 137) in System.Runtime.Tests by isolating the ~2GB “max array length” allocations into dedicated tests and conditionally skipping them on configurations more likely to exceed cgroup memory limits.
Changes:
- Extracted the max-size
byte[]allocation checks into two new OuterLoop tests:AllocateUninitializedArray_MaxSizeandAllocateArray_MaxSize. - Added
[ConditionalFact]gating for the new max-size tests usingPlatformDetection.Is64BitProcessandPlatformDetection.IsReleaseRuntime. - Kept the original
AllocateUninitializedArray/AllocateArraytests running their other allocation scenarios on all configurations.
| [ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))] | ||
| [OuterLoop] |
There was a problem hiding this comment.
The guard here (IsReleaseRuntime) skips this max-size allocation on all non-Release runtime builds across OSes/runtimes. The PR description says the issue is specific to Linux Debug/Checked CoreCLR, so this condition looks broader than intended; consider adding narrower conditions (e.g., IsLinux and/or IsCoreCLR) or updating the PR description to match the new behavior.
| [OuterLoop] | ||
| private static void AllocateUninitializedArray_MaxSize() | ||
| { | ||
| int i = 0x7FFFFFC7; |
There was a problem hiding this comment.
0x7FFFFFC7 is a magic number. Consider using Array.MaxLength (or a named constant) to self-document that this is the maximum supported array length and avoid duplicating the value in multiple tests.
| int i = 0x7FFFFFC7; | |
| int i = Array.MaxLength; |
| [ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))] | ||
| [OuterLoop] |
There was a problem hiding this comment.
The guard here (IsReleaseRuntime) skips this max-size allocation on all non-Release runtime builds across OSes/runtimes. The PR description says the issue is specific to Linux Debug/Checked CoreCLR, so this condition looks broader than intended; consider adding narrower conditions (e.g., IsLinux and/or IsCoreCLR) or updating the PR description to match the new behavior.
| int i = 0x7FFFFFC7; | ||
| var arr = GC.AllocateArray<byte>(i); |
There was a problem hiding this comment.
0x7FFFFFC7 is a magic number. Consider using Array.MaxLength (or a named constant) to self-document that this is the maximum supported array length and avoid duplicating the value in multiple tests.
| { | ||
| int i = 0x7FFFFFC7; | ||
| var arr = GC.AllocateUninitializedArray<byte>(i); | ||
| [ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))] |
There was a problem hiding this comment.
Debug/Checked CoreCLR, runtime overhead pushes this over the cgroup memory limit, causing SIGKILL (exit code 137).
The difference between checked and release runtime overhead is negligible relative to the GBs of memory that this test allocates. IsReleaseRuntime condition does not make sense to me.
There was a problem hiding this comment.
If we are seeing these failures on checked build only, it suggests that there is something very inefficient in checked runtime for arrays of this size. We should fix that instead of disabling the test.
AllocateUninitializedArrayandAllocateArrayOuterLoop tests each allocate a contiguousbyte[0x7FFFFFC7](~2 GB) as part of their max-size validation. On Linux Debug/Checked CoreCLR, runtime overhead pushes this over the cgroup memory limit, causing SIGKILL (exit code 137).Changes
AllocateUninitializedArray_MaxSizeAllocateArray_MaxSize[ConditionalFact]onPlatformDetection.Is64BitProcess+PlatformDetection.IsReleaseRuntime, skipping on Debug and Checked runtime buildsAllocateUninitializedArrayandAllocateArraytests are unchanged and continue running SOH/LOH/string allocation sub-tests on all configurationsNote
This PR description was drafted with Copilot assistance.
Original prompt
This section details on the original issue you should resolve
<issue_title>[outerloop] System.Runtime.Tests crash with exit code 137 in linux debug coreclr</issue_title>
<issue_description>Found in a PR that fixed an unrelated failure affecting all outerloop runs, hiding other failures. #97135
Error Blob
{ "ErrorMessage": "", "BuildRetry": false, "ErrorPattern": "Killed.+System\\.Runtime\\.Tests\\.runtimeconfig\\.json.+OuterLoop", "ExcludeConsoleLog": false }Reproduction Steps
linux_musl-x64 Debug CoreCLR_Debuglinux-x64 Debug CoreCLR_DebugDetails