Skip to content

[SYCL][UR][L0] Restrict USM residency to peers with enabled P2P access#21889

Merged
sarnex merged 20 commits into
intel:syclfrom
ldorau:URL0_Memory_resident_limit_to_enabled_peers
May 18, 2026
Merged

[SYCL][UR][L0] Restrict USM residency to peers with enabled P2P access#21889
sarnex merged 20 commits into
intel:syclfrom
ldorau:URL0_Memory_resident_limit_to_enabled_peers

Conversation

@ldorau
Copy link
Copy Markdown
Contributor

@ldorau ldorau commented Apr 28, 2026

Implement P2P access control for the Level Zero v2 adapter so that USM
device memory is made resident on a peer device only when peer access has
been explicitly enabled by the user.

  • platform.cpp: Populate a per-device peers[] table during device-cache
    initialisation using zeDeviceGetP2PProperties + zeDeviceCanAccessPeer.
    Each connectable pair starts as DISABLED; pairs without hardware
    connectivity are NO_CONNECTION.

  • device.hpp/cpp: Introduce PeerStatus enum (NO_CONNECTION / DISABLED /
    ENABLED) and a peers[] vector on ur_device_handle_t_.

  • usm_p2p.cpp (v2, new): Implement P2P enable/disable/query. Enable/disable
    update peerDevice->peers[commandDevice->Id] and call
    Context->changeResidentDevice() for every context on the platform.

  • context.cpp/hpp: Replace the static p2pAccessDevices table and
    getP2PDevices() with two dynamic helpers that consult the live peers[]
    table: getDevicesWhoseAllocationsCanBeAccessedFrom() and
    getDevicesWhichCanAccessAllocationsPresentOn(). Add
    changeResidentDevice() to propagate P2P changes to all USM pools.

  • usm.cpp/hpp: Update makeProvider() to use
    getDevicesWhichCanAccessAllocationsPresentOn() so providers only pin
    memory on peers with actively enabled access. Add changeResidentDevice()
    to ur_usm_pool_handle_t_ which calls
    umfLevelZeroMemoryProviderResidentDeviceChange() for the affected peer.

  • command_list_manager.cpp: Add checkP2PAccess() helper and call it from
    appendUSMMemcpy() to reject copies between devices without enabled P2P,
    returning UR_RESULT_ERROR_INVALID_OPERATION.

  • ur_pool_manager.hpp: Add supportsResidentDevices() and
    forEachPoolWithDesc() to pool_descriptor / pool_manager.

  • sycl/source/device.cpp: Extract p2pAccessHelper<> template shared by
    ext_oneapi_enable_peer_access and ext_oneapi_disable_peer_access to
    avoid duplication and enforce the same-platform constraint.

Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk at intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau at intel.com>

@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 7e50ca5 to d0b4788 Compare April 28, 2026 07:23
@ldorau ldorau requested a review from Copilot April 28, 2026 07:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements peer-access–driven memory residency management for the Level Zero v2 adapter, wiring ext_oneapi_enable_peer_access/disable through UR to update USM pool residency, and adjusting SYCL’s peer-access API to avoid cross-platform usage.

Changes:

  • Add L0 v2 peer-access implementation that toggles per-device peer state and propagates residency updates to all tracked contexts.
  • Extend USM pool/provider plumbing to support runtime resident-device changes and add pool-manager iteration helpers.
  • Update SYCL peer-access enable/disable to validate platforms; add initial (currently placeholder) UR tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Adds multi-device residency tests for peer-access (currently placeholders).
unified-runtime/source/common/ur_pool_manager.hpp Adds descriptor helpers and pool-manager iteration with descriptor access.
unified-runtime/source/common/backtrace_lin.cpp Introduces a constant for max backtrace frames.
unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp New L0 v2 implementation for peer access enable/disable/info and context propagation.
unified-runtime/source/adapters/level_zero/v2/usm.hpp Exposes USM pool API to change resident devices.
unified-runtime/source/adapters/level_zero/v2/usm.cpp Updates provider creation to use peer-enabled residency model and adds residency-change plumbing.
unified-runtime/source/adapters/level_zero/v2/memory.cpp Switches P2P eligibility check to the new “enabled peers” model.
unified-runtime/source/adapters/level_zero/v2/context.hpp Adds APIs to query enabled peer relationships and to propagate residency changes.
unified-runtime/source/adapters/level_zero/v2/context.cpp Removes precomputed P2P tables; tracks contexts; adds peer-access query helpers and residency propagation.
unified-runtime/source/adapters/level_zero/usm_p2p.cpp Updates v1 behavior to log enable/disable as ignored (always enabled).
unified-runtime/source/adapters/level_zero/platform.hpp Updates platform comment to reflect v2 peer-access usage of tracked contexts.
unified-runtime/source/adapters/level_zero/platform.cpp Initializes per-device peer tables based on L0 P2P capability/properties.
unified-runtime/source/adapters/level_zero/device.hpp Adds peer-status table to devices and stream operators.
unified-runtime/source/adapters/level_zero/device.cpp Implements stream operators for device id and peer status.
unified-runtime/source/adapters/level_zero/context.cpp Minor comment adjustment around context tracking in v1.
unified-runtime/source/adapters/level_zero/CMakeLists.txt Moves usm_p2p.cpp into the v2 adapter build.
sycl/source/device.cpp Adds same-platform validation for enable/disable peer access calls.
.github/copilot-instructions.md Expands repository instructions/documentation for Copilot usage.

Comment thread sycl/source/device.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/common/backtrace_lin.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from d0b4788 to 24eaab3 Compare April 28, 2026 07:53
@ldorau ldorau requested a review from Copilot April 28, 2026 07:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/common/ur_pool_manager.hpp
Comment thread unified-runtime/source/adapters/level_zero/usm_p2p.cpp
Comment thread sycl/source/device.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch 2 times, most recently from 174e3f1 to e2e57bf Compare April 28, 2026 09:58
@ldorau ldorau requested a review from Copilot April 28, 2026 10:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/platform.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/device.hpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from e2e57bf to 8346b35 Compare April 28, 2026 12:47
@ldorau ldorau requested a review from Copilot April 28, 2026 12:49
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 8346b35 to 9e50ee8 Compare April 28, 2026 12:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/common/backtrace.hpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 9e50ee8 to 65c706f Compare April 28, 2026 13:34
@ldorau ldorau requested a review from Copilot April 28, 2026 13:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 65c706f to a2b54f4 Compare April 28, 2026 14:24
@ldorau ldorau requested a review from Copilot April 28, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from a2b54f4 to 08a08ab Compare April 28, 2026 14:50
@ldorau ldorau requested a review from Copilot April 28, 2026 14:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@github-actions
Copy link
Copy Markdown
Contributor

@intel/llvm-gatekeepers please consider merging

@kswiecicki
Copy link
Copy Markdown
Contributor

@ldorau, rebase please.

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

@ldorau, rebase please.

@kswiecicki Done

ldorau added 18 commits May 18, 2026 08:16
- Skip peers with disabled P2P in makeProvider (USM pool creation)
- Add urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp
- Track per-device peer status in ur_device_handle_t_::peers[]
- Update existing USM pool residency on P2P enable/disable

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
- Fill in three placeholder multi-device tests in memory_residency.cpp
- Tests verify P2P-driven residency: absent-on-peer without P2P,
  enable/disable state machine checks, end-to-end data transfer

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Extract common logic from ext_oneapi_enable_peer_access and
ext_oneapi_disable_peer_access into a templated p2pAccessHelper
function to avoid code duplication.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The disablePeerAccessStateMachineAndSourceAllocationPersists test was
failing intermittently because deferred frees from the preceding test
complete asynchronously, causing UR_DEVICE_INFO_GLOBAL_MEM_FREE to
report more free memory than the baseline captured at the start of the
test.

Remove the unreliable source-device free-memory assertion and the
allocation it required, keeping only the state-machine checks (disable
succeeds, double-disable returns UR_RESULT_ERROR_INVALID_OPERATION).
The source-device allocation property is already covered by
allocatingDeviceMemoryWillResultInOOM which runs first in isolation.
…oreEnablingPeerAccess

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The peer table lives on peerDevice: peerDevice->peers[commandDevice->Id]
tracks whether commandDevice is allowed to access peerDevice's
allocations.

Update urUsmP2PChangePeerAccessExp to lock peerDevice's mutex,
read/write peerDevice's peer table, use peerDevice's platform for
context iteration, and pass (peerDevice, commandDevice) to
changeResidentDevice and validateP2PDevicePair.

Also fix urUsmP2PPeerAccessGetInfoExp to query the peer table on
peerDevice rather than commandDevice.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Add a P2P peer-status check in command_list_manager::appendUSMMemcpy.
When both source and destination are device memory and the source
resides on a different device, the adapter queries the source device's
peer table to verify that access has been granted to the queue's device.
Returns UR_RESULT_ERROR_INVALID_OPERATION if P2P access has not been
enabled.  Copies to host or shared memory are always allowed regardless
of P2P state.

Previously, zeCommandListAppendMemoryCopy would silently succeed for
cross-device copies via the copy engine regardless of P2P state, making
it impossible to test that ext_oneapi_disable_peer_access actually
revokes access.

Also adds negative-pair tests that verify urEnqueueUSMMemcpy fails when
P2P is disabled:
- enablePeerAccessStateMachineAndSourceAllocationFailsWithoutP2P
- p2pReadFailsWithoutPeerAccessDisabled

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Adds sycl/test-e2e/USM/P2P/p2p_usm_residency.cpp to verify that
the Level Zero v2 adapter correctly handles P2P access for USM device
memory between peer devices.

Phase 1: Allocates memory on dev0, fills it with a known pattern,
enables P2P access from dev1 to dev0, then uses dev1's queue to
copy the data to the host and verifies all values match.

Phase 2 (opposite direction): Allocates memory on dev1, fills it
with a different pattern, enables P2P access from dev0 to dev1,
then uses dev0's queue to copy the data to the host and verifies
correctness.

Phase 3 (negative): Enables then disables P2P access from dev1
to dev0, then attempts a memcpy via dev1's queue. The test passes
if the memcpy throws an exception or if the copied data does not
match the original fill pattern, confirming that
ext_oneapi_disable_peer_access actually revokes access.

Also adds the 'two-or-more-gpu-devices' lit feature to
lit.cfg.py, set when sycl-ls reports at least two GPU devices.
The test uses this feature to skip on single-GPU machines.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The source-device free-memory check in
enablePeerAccessStateMachineAndSourceAllocation was failing
intermittently because deferred frees from earlier tests complete
asynchronously, causing UR_DEVICE_INFO_GLOBAL_MEM_FREE to report more
free memory than the baseline.

Remove the unreliable assertion. The test's actual value is in verifying
the P2P state machine (double-enable returns error) and the end-to-end
data transfer correctness.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Disable P2P access before freeing the allocation. The previous order
freed the memory while P2P was still enabled, leaving a brief window
where the peer device held access rights to a released allocation.
The correct cleanup sequence is to revoke peer access first and then
free the memory.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Replace the nested if-ladder in appendUSMMemcpy with a flat static helper
function checkP2PAccess that uses early returns, as suggested in the review.
The logic is identical; the refactoring makes the control flow easier to
follow.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
…bled test

The old comment claimed that P2P must be enabled before the allocation,
which is incorrect. urUsmP2PEnablePeerAccessExp calls changeResidentDevice
on all contexts after updating the peer-status table, which retroactively
makes already-existing allocations resident on the peer device. Enabling
P2P after allocation is therefore equally valid, as demonstrated by the
allocBeforeEnablingPeerAccess test. Update the comment to reflect the
actual adapter behaviour.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The test exercises exactly the same code path as the existing
enablePeerAccessStateMachineAndSourceAllocationFailsWithoutP2P:
both allocate on devices[0] without P2P, attempt a copy via
devices[1]'s queue, and assert UR_RESULT_ERROR_INVALID_OPERATION.
Remove the duplicate to avoid maintaining two tests that provide
no additional coverage.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The test was calling Devs[0].ext_oneapi_enable_peer_access(Devs[1]),
which sets Devs[1]->peers[Devs[0].Id] = ENABLED, meaning Devs[0] can
access Devs[1]'s memory. However the actual P2P copy was
Queues[1].copy(arr0, arr1, N) — Devs[1]'s queue reading arr0 which
lives on Devs[0] — which requires the opposite direction:
Devs[0]->peers[Devs[1].Id] = ENABLED, set by
Devs[1].ext_oneapi_enable_peer_access(Devs[0]).

The wrong direction was harmless with the L0v1 adapter (no peer-access
check on memcpy), but the L0v2 adapter's checkP2PAccess enforces the
correct direction and returned UR_RESULT_ERROR_INVALID_OPERATION.

Also add ext_oneapi_disable_peer_access before freeing arr0, consistent
with the rule that peer access should be revoked before the guarded
allocation is released.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Verify that revoking peer access prevents subsequent USM copies. The
test enables P2P (devices[1] -> devices[0]), confirms that a
urEnqueueUSMMemcpy succeeds, then disables P2P and asserts that the
same copy returns UR_RESULT_ERROR_INVALID_OPERATION.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Verify the transition from blocked to permitted: attempt a USM copy
from devices[1]'s queue without P2P (expects
UR_RESULT_ERROR_INVALID_OPERATION), then enable P2P on the same
allocation and retry (expects success with correct data).

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Add testP2PReadFailsWithoutEnable: allocates on dev0, attempts a
device-to-device memcpy via dev1's queue without ever calling
ext_oneapi_enable_peer_access, and asserts a SYCL exception is thrown.
This covers the case where P2P was never enabled (Phase 3 already
covers the revoke case).

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
… P2P enable

Add testP2PReadFailsThenSucceedsAfterEnable: attempts a
device-to-device memcpy without P2P (expects a SYCL exception), then
enables P2P on the same allocation and retries (expects success with
correct data), then disables P2P and cleans up.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

@kswiecicki I will update the PR's description in a few minutes.

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

@kswiecicki I will update the PR's description in a few minutes.

Done

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

@mateuszpn Have merge conflicts in unified-runtime/source/adapters/level_zero/v2/context.cpp been resolved correctly?

@github-actions
Copy link
Copy Markdown
Contributor

@intel/llvm-gatekeepers please consider merging

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

Need to fix one thing.

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

Fixed issue found after rebase and incorrectly resolving the last merge conflict - thank you @mateuszpn !

ldorau added 2 commits May 18, 2026 13:50
When urContextCreate reuses the driver default context via
zeDriverGetDefaultContext, the resulting ur_context_handle_t_ was not
added to hPlatform->Contexts.

urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp iterate
Platform->Contexts to call changeResidentDevice() on every live context,
so any context created through that fast path would silently miss all P2P
residency updates — memory would not be made resident on newly enabled
peers, and would not be evicted on disable.

Fix by adding the context to hPlatform->Contexts in the fast path, the
same way the regular zeContextCreate path and
urContextCreateWithNativeHandle already do.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
… memory residency tests

The tests were using 1 MB allocations, which fall within the disjoint
pool's MaxPoolableSize of 4 MB for device USM.  Pooled allocations are
not immediately returned to the UMF memory provider on free; they stay
in the pool's free-list cache with stale per-device residency from the
preceding test.  This cross-test interference caused
allocationNotResidentOnPeerWithoutP2P to observe a free-memory decrease
on the peer device that exceeded allocSize, failing the assertion.

Define a single file-scope constant kAllocSize = 5 MB (> 4 MB) and use
it in all tests.  Allocations above MaxPoolableSize bypass the pool's
free-list cache and go directly to the UMF level_zero memory provider,
where residency is set at allocation time based on the current P2P
state, and released when the allocation is freed back to the OS.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 18, 2026

@intel/llvm-gatekeepers please merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants