ce: fix CeUtils scheduling left paused on error paths in kceTopLevelPceLceMappingsUpdate#1210
Open
rawrmonster17 wants to merge 1 commit into
Open
ce: fix CeUtils scheduling left paused on error paths in kceTopLevelPceLceMappingsUpdate#1210rawrmonster17 wants to merge 1 commit into
rawrmonster17 wants to merge 1 commit into
Conversation
…ceLceMappingsUpdate
cePauseCeUtilsScheduling() is called at the start of
kceTopLevelPceLceMappingsUpdate_IMPL() to block RM-internal CE
submissions while PCE-LCE mappings are being updated. However, two
error paths return without calling the matching
ceResumeCeUtilsScheduling():
1. NV_ASSERT_OK_OR_RETURN() on rmapiControlCacheFreeForControl()
returns immediately on failure, skipping the resume.
2. The early return on NV2080_CTRL_CMD_CE_UPDATE_PCE_LCE_MAPPINGS_V2
failure likewise skips the resume.
When either path fires, CeUtils submission stays permanently paused
for the lifetime of the GPU instance. Subsequent RM-internal CE
operations (memory scrubbing, allocation init) stall or fail.
Fix by converting both early returns to goto cleanup so that
ceResumeCeUtilsScheduling() is always called after the pause,
regardless of which error path is taken. Also convert the
NV_ASSERT_OK_OR_RETURN() to an explicit status check so the error
is captured in status before branching to cleanup.
Signed-off-by: rawrmonster17 <rawrmonster17@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
kceTopLevelPceLceMappingsUpdate_IMPL()callscePauseCeUtilsScheduling()to block RM-internal Copy Engine submissions while PCE-LCE mappings are
being reconfigured. The matching
ceResumeCeUtilsScheduling()is onlyreached on the success path, leaving two error paths that return without
resuming:
NV_ASSERT_OK_OR_RETURN()onrmapiControlCacheFreeForControl()—the macro returns immediately on failure, bypassing the resume call.
return statuswhenNV2080_CTRL_CMD_CE_UPDATE_PCE_LCE_MAPPINGS_V2fails.When either path fires,
CeUtilssubmission stays permanently paused forthe lifetime of the GPU instance. Subsequent RM-internal CE operations
(memory scrubbing, allocation init) will stall or fail silently.
Fix
Convert both early returns to
goto cleanupsoceResumeCeUtilsScheduling()is unconditionally called after the pause, regardless of which path is
taken. The
NV_ASSERT_OK_OR_RETURN()is replaced with an explicit statuscheck so the error is captured in
statusbefore jumping tocleanup.This matches the standard cleanup-label pattern used throughout the RM
codebase for balanced resource acquire/release.