Skip to content

Sync with Microsoft ONNX Runtime - 11062026#1130

Merged
ankitm3k merged 4 commits into
ovep-developfrom
sync_msft_11062026
Jun 11, 2026
Merged

Sync with Microsoft ONNX Runtime - 11062026#1130
ankitm3k merged 4 commits into
ovep-developfrom
sync_msft_11062026

Conversation

@ai-fw-intg

Copy link
Copy Markdown

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

yuslepukhin and others added 4 commits June 9, 2026 14:31
This pull request significantly improves the safety, correctness, and
memory management of LoRA adapter handling in ONNX Runtime, especially
around Python bindings and adapter file export/import. The main focus is
on ensuring strong exception safety, preventing use-after-free bugs by
improving object lifetimes, and rejecting unsupported tensor types
during export. Additionally, comprehensive regression tests are added to
guard against these issues.

**Key changes include:**

### Exception Safety & Parameter Handling
- Refactored `LoraAdapter::Load` and `MemoryMap` to provide a strong
exception guarantee: all potentially-throwing operations are performed
using local variables before committing to the object's state, ensuring
no partial updates occur on failure. The new `BuildParamsValues` method
builds the parameter map without side effects, replacing the old
`InitializeParamsValues`.
[[1]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L40-R68)
[[2]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L85-R105)
[[3]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L98-R115)
[[4]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L120-R137)
[[5]](diffhunk://#diff-bd912c8889776d55e73fa4c6291385f7a55675c8885c350e96bdaf0e7187db51L154-R173)

### Python Bindings & Memory Management
- Improved the Python adapter format bindings so that every `OrtValue`
returned from adapter parameter getters is pinned to its owning C++
adapter object via pybind11's `keep_alive` mechanism. This prevents
use-after-free errors if the parent `AdapterFormat` object is dropped
while references to its parameters remain. The getter now builds the
parameter dictionary on demand and avoids reference cycles that would
leak memory.
[[1]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL38-R56)
[[2]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL85-R199)
[[3]](diffhunk://#diff-f0e8ba8cb8cb07b51b3be675bf62cec07e2eae1461341ce5801d33a57c8f57fdR110-R113)

### Adapter Export Robustness
- Enhanced the adapter export logic to reject string tensors, preventing
the leaking of memory addresses and creation of unloadable adapter
files. The export path now builds the adapter image entirely in memory
before writing to disk, ensuring no partial files are left behind on
error.

### Clean-up & Consistency
- Simplified the construction and usage of the
`PyAdapterFormatReaderWriter` class, ensuring that its internal state is
only populated as appropriate for read or write operations, and removed
unnecessary parameter passing.
[[1]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL38-R56)
[[2]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL128-R218)
- Minor cleanup in property definitions and comments for clarity and
maintainability.

### Regression Tests
- Added thorough regression tests to verify that adapter parameter
lifetimes are managed correctly and that exporting string tensors is
properly rejected, with checks to ensure no files are created on
failure.

These changes collectively make adapter handling safer and more robust,
especially when interacting with Python, and add critical safeguards
against subtle memory and serialization bugs.
…oft#28761)

## Summary

- Introduces `SessionBufferPool` that lets a session hold on to retired
generator buffer caches (storage + uniform) and seed them into newly
created generators.
- Adds provider option
`ep.webgpuexecutionprovider.sessionBufferPoolGenerations` to bound how
many generations of retired buffers are kept (default `1`; set to `0` to
disable).
- Wires the WebGPU EP to donate a retiring `BufferManager`'s cache into
the pool and absorb pooled buffers when a new `BufferManager` is created
for the next generator.
- The pool is only created when graph capture is enabled AND the option
is > 0, so non-graph-capture sessions are unaffected.

## Motivation

With graph capture enabled, each generator owns its own per-graph
`BufferManager`. When the generator is destroyed (e.g., per-request in
GenAI), the entire buffer cache is thrown away and the next generator
must reallocate all storage and uniform buffers from scratch, increasing
cold-start latency and GPU memory churn.

By keeping a small pool of recently-retired buffer slots at the session
level, the next generator can reuse them and skip reallocation entirely
after the first cycle.

## Test plan

- [x] Build ORT (Windows, D3D12) with ``--use_webgpu`` — clean build.
- [x] ``lintrunner -a`` reports no lint issues.
- [x] Verified end-to-end with GenAI on phi4 + WebGPU graph capture
using two scripts:
- ``verify_multi_gen.py``: sequential and overlapping generators all
produce matching, coherent output.
- ``verify_max_length_change.py``: generators with varying
``max_length`` all coherent.
- [x] With diagnostic prints (since removed), confirmed that after the
first generator donates buffers, subsequent generators report ``storage
hits=171 misses=0, uniform hits=296 misses=0``, i.e., the pool actually
engages and eliminates reallocation.

## Notes

- Pairs with a GenAI-side change that invokes
``SessionReleaseCapturedGraph`` from ``State::~State()`` so the
per-graph ``BufferManager`` is actually released and its buffers reach
the pool.
### Description

This pull request introduces a mechanism for exposing experimental C API
functions in ONNX Runtime. The new system enables the addition,
iteration, and eventual promotion of experimental APIs without impacting
the stable ABI, using a name-based function pointer lookup and a
generated header for type safety and ergonomics. The changes include
documentation, build integration, header generation, implementation, and
test coverage for the new experimental API flow.

**Experimental C API Framework**

* Added a design doc (`Experimental_C_API.md`) detailing the motivation,
design decisions, and usage patterns for the experimental C API
mechanism.
* Introduced a central declaration file
(`onnxruntime_experimental_c_api.inc`) using X-macros to define
experimental API functions and their lifecycle rules. The X-macro
signature uses `ORT_EXPERIMENTAL_API(VER, RET, NAME, ...)` ordering
(return type before name) to match the convention used by `ORT_API_T` in
the stable API.
* Added a generated consumer header (`onnxruntime_experimental_c_api.h`)
that provides C typedefs, name constants, and C++ typed accessors for
experimental functions. Experimental function names follow the pattern
`<TargetStruct>_<Name>_SinceV<APIVersion>` to unambiguously convey
availability and avoid collision.
* Updated the stable C API struct (`OrtApi` in `onnxruntime_c_api.h`) to
include a single function pointer, `GetExperimentalFunction`, for
name-based experimental function lookup. The `OrtExperimentalFnPtr`
generic function pointer type (rather than `void*`) is used as the
return type to avoid undefined behavior when casting between function
pointers.
* Integrated the new headers into the build system so they are installed
and available to consumers.

**Implementation and Test Coverage**

* Implemented the runtime support for experimental API lookup and
function registration (`experimental_c_api.cc`), including a test-only
function (`OrtApi_ExperimentalApiTest`) to exercise the mechanism
end-to-end.
* Registered the new experimental API entry point in the exported API
table (`onnxruntime_c_api.cc`).
* Added a unit test source file for experimental API coverage.

### Motivation and Context

Enable support for experimental C APIs.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@ankitm3k ankitm3k merged commit 02294e5 into ovep-develop Jun 11, 2026
7 of 10 checks passed
@ankitm3k ankitm3k deleted the sync_msft_11062026 branch June 11, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants