Skip to content

Fix OOM crash (exit 137) in System.Runtime.Tests OuterLoop on Linux Debug/Checked CoreCLR#126013

Closed
Copilot wants to merge 2 commits intomainfrom
copilot/fix-system-runtime-tests-crash
Closed

Fix OOM crash (exit 137) in System.Runtime.Tests OuterLoop on Linux Debug/Checked CoreCLR#126013
Copilot wants to merge 2 commits intomainfrom
copilot/fix-system-runtime-tests-crash

Conversation

Copy link
Contributor

Copilot AI commented Mar 24, 2026

AllocateUninitializedArray and AllocateArray OuterLoop tests each allocate a contiguous byte[0x7FFFFFC7] (~2 GB) as part of their max-size validation. On Linux Debug/Checked CoreCLR, runtime overhead pushes this over the cgroup memory limit, causing SIGKILL (exit code 137).

Changes

  • Extracted the ~2 GB max-size allocation sub-block from each test into dedicated methods:
    • AllocateUninitializedArray_MaxSize
    • AllocateArray_MaxSize
  • Guarded both new methods with [ConditionalFact] on PlatformDetection.Is64BitProcess + PlatformDetection.IsReleaseRuntime, skipping on Debug and Checked runtime builds
  • The original AllocateUninitializedArray and AllocateArray tests are unchanged and continue running SOH/LOH/string allocation sub-tests on all configurations
[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))]
[OuterLoop]
private static void AllocateUninitializedArray_MaxSize()
{
    int i = 0x7FFFFFC7;
    var arr = GC.AllocateUninitializedArray<byte>(i);
    arr[0] = 5;
    arr[i - 1] = 17;
    Assert.True(arr[0] == 5 && arr[i - 1] == 17);
}

Note

This PR description was drafted with Copilot assistance.

Original prompt

This section details on the original issue you should resolve

<issue_title>[outerloop] System.Runtime.Tests crash with exit code 137 in linux debug coreclr</issue_title>
<issue_description>Found in a PR that fixed an unrelated failure affecting all outerloop runs, hiding other failures. #97135

Error Blob

{
  "ErrorMessage": "",
  "BuildRetry": false,
  "ErrorPattern": "Killed.+System\\.Runtime\\.Tests\\.runtimeconfig\\.json.+OuterLoop",
  "ExcludeConsoleLog": false
}

Reproduction Steps

Details
========================== End custom configuration settings ===============================
----- start Mon Feb 26 23:56:03 UTC 2024 =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=AdditionalTimezoneChecks -notrait category=IgnoreForCI -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Tests (found 29 of 9308 test cases)
  Starting:    System.Runtime.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 180:    21 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=AdditionalTimezoneChecks -notrait category=IgnoreForCI -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Mon Feb 26 23:56:28 UTC 2024 ----- exit code 137 ----------------------------------------------------------
exit code 137 means SIGKILL Killed either due to out of memory/resources (see /var/log/messages) or by explicit kill.
ulimit -c value: unlimited
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 1
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dumps...
Looking for files matching core.* ...
----- start =============== XUnitLogChecker Output =====================================================
dmesg: klogctl: Operation not permitted
No dumps found in /home/helixbot/dotnetbuild/dumps.
----- end =============== XUnitLogChecker Output - exit code 0 ===========================
+ export '_commandExitCode=137'
+ python /root/helix/work/correlation/reporter/run.py https://dev.azure.com/dnceng-public/ public 13931354 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Im9PdmN6NU1fN3AtSGpJS2xGWHo5M3VfVjBabyJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJhcHBfdG9rZW4iLCJhdWkiOiJhMTBlYjk0MS0xNzM2LTRjNzYtOWFiZC1iZDBlZGYxNjg0ODgiLCJzaWQiOiIxZTViN2VlMC1iNmQyLTQzNjgtODZjOS0zOTQ5MGIxM2FlODAiLCJCdWlsZElkIjoiY2JiMTgyNjEtYzQ4Zi00YWJiLTg2NTEtOGNkY2I1NDc0NjQ5OzU4MDE5NSIsImpvYnJlZiI6IjMxNTZhMmQwLTExYWItNGExMC04NTA0LWY3Y2UxM2ZiMTk2OTo1OTJlZWQ4OS03OGY1LTUyMDItOWRhNS1hZDE3ZGYzYmVmNGEiLCJwcGlkIjoidnN0ZnM6Ly8vQnVpbGQvQnVpbGQvNTgwMTk1Iiwib3JjaGlkIjoiMzE1NmEyZDAtMTFhYi00YTEwLTg1MDQtZjdjZTEzZmIxOTY5LmJ1aWxkLmJ1aWxkX2xpbnV4X211c2xfeDY0X2RlYnVnX2NvcmVjbHJfZGVidWcuX19kZWZhdWx0IiwicmVwb0lkcyI6IiIsImlzcyI6ImFwcC52c3Rva2VuLnZpc3VhbHN0dWRpby5jb20iLCJhdWQiOiJhcHAudnN0b2tlbi52aXN1YWxzdHVkaW8uY29tfHZzbzo2ZmNjOTJlNS03M2E3LTRmODgtOGQxMy1kOTA0NWI0NWZiMjciLCJuYmYiOjE3MDg5ODg5MzgsImV4cCI6MTcwOTAwMDkzOH0.rjlcWYrlHLfuHwUtUIiDkhhUDwy8cnQZ3rCBJjmd8FuCN0BDFzjz0jEMQEbQ9WZ7LHqdWN40vrD1MHH2p9a2tC9JYY7ca4b2w8Yy05wlbG-_cjF_ksT3jfhgz4EOLBiSmE7-bW_m1b3gtKHo-rdFDyMt_...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes dotnet/runtime#98964

<!-- START COPILOT CODING AGENT TIPS -->
---

📱 Kick off Copilot coding agent tasks wherever you are with [GitHub Mobile](https://gh.io/cca-mobile-docs), available on iOS and Android.

Extract the ~2 GB max-size byte array allocation from AllocateUninitializedArray
and AllocateArray OuterLoop tests into dedicated ConditionalFact methods that
are skipped on non-64-bit and non-Release runtime builds. This prevents the
Linux OOM killer (SIGKILL, exit code 137) from terminating the test process
on Debug and Checked CoreCLR builds where runtime overhead is higher.

Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/de35aa82-798e-4a8a-805c-ad6599aab895
Copilot AI changed the title [WIP] Fix System.Runtime.Tests crash with exit code 137 Fix OOM crash (exit 137) in System.Runtime.Tests OuterLoop on Linux Debug/Checked CoreCLR Mar 24, 2026
Copilot AI requested a review from danmoseley March 24, 2026 04:38
@danmoseley danmoseley marked this pull request as ready for review March 24, 2026 04:55
Copilot AI review requested due to automatic review settings March 24, 2026 04:55
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Linux OuterLoop OOM terminations (exit code 137) in System.Runtime.Tests by isolating the ~2GB “max array length” allocations into dedicated tests and conditionally skipping them on configurations more likely to exceed cgroup memory limits.

Changes:

  • Extracted the max-size byte[] allocation checks into two new OuterLoop tests: AllocateUninitializedArray_MaxSize and AllocateArray_MaxSize.
  • Added [ConditionalFact] gating for the new max-size tests using PlatformDetection.Is64BitProcess and PlatformDetection.IsReleaseRuntime.
  • Kept the original AllocateUninitializedArray / AllocateArray tests running their other allocation scenarios on all configurations.

Comment on lines +1031 to +1032
[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))]
[OuterLoop]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard here (IsReleaseRuntime) skips this max-size allocation on all non-Release runtime builds across OSes/runtimes. The PR description says the issue is specific to Linux Debug/Checked CoreCLR, so this condition looks broader than intended; consider adding narrower conditions (e.g., IsLinux and/or IsCoreCLR) or updating the PR description to match the new behavior.

Copilot uses AI. Check for mistakes.
[OuterLoop]
private static void AllocateUninitializedArray_MaxSize()
{
int i = 0x7FFFFFC7;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x7FFFFFC7 is a magic number. Consider using Array.MaxLength (or a named constant) to self-document that this is the maximum supported array length and avoid duplicating the value in multiple tests.

Suggested change
int i = 0x7FFFFFC7;
int i = Array.MaxLength;

Copilot uses AI. Check for mistakes.
Comment on lines +1084 to +1085
[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))]
[OuterLoop]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard here (IsReleaseRuntime) skips this max-size allocation on all non-Release runtime builds across OSes/runtimes. The PR description says the issue is specific to Linux Debug/Checked CoreCLR, so this condition looks broader than intended; consider adding narrower conditions (e.g., IsLinux and/or IsCoreCLR) or updating the PR description to match the new behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +1088 to +1089
int i = 0x7FFFFFC7;
var arr = GC.AllocateArray<byte>(i);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x7FFFFFC7 is a magic number. Consider using Array.MaxLength (or a named constant) to self-document that this is the maximum supported array length and avoid duplicating the value in multiple tests.

Copilot uses AI. Check for mistakes.
{
int i = 0x7FFFFFC7;
var arr = GC.AllocateUninitializedArray<byte>(i);
[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess), nameof(PlatformDetection.IsReleaseRuntime))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug/Checked CoreCLR, runtime overhead pushes this over the cgroup memory limit, causing SIGKILL (exit code 137).

The difference between checked and release runtime overhead is negligible relative to the GBs of memory that this test allocates. IsReleaseRuntime condition does not make sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are seeing these failures on checked build only, it suggests that there is something very inefficient in checked runtime for arrays of this size. We should fix that instead of disabling the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants