Sync with Microsoft ONNX Runtime - 11062026 by ai-fw-intg · Pull Request #1130 · intel/onnxruntime

ai-fw-intg · 2026-06-10T20:33:40Z

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

This pull request significantly improves the safety, correctness, and memory management of LoRA adapter handling in ONNX Runtime, especially around Python bindings and adapter file export/import. The main focus is on ensuring strong exception safety, preventing use-after-free bugs by improving object lifetimes, and rejecting unsupported tensor types during export. Additionally, comprehensive regression tests are added to guard against these issues. **Key changes include:** ### Exception Safety & Parameter Handling - Refactored `LoraAdapter::Load` and `MemoryMap` to provide a strong exception guarantee: all potentially-throwing operations are performed using local variables before committing to the object's state, ensuring no partial updates occur on failure. The new `BuildParamsValues` method builds the parameter map without side effects, replacing the old `InitializeParamsValues`. [[1]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L40-R68) [[2]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L85-R105) [[3]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L98-R115) [[4]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L120-R137) [[5]](diffhunk://#diff-bd912c8889776d55e73fa4c6291385f7a55675c8885c350e96bdaf0e7187db51L154-R173) ### Python Bindings & Memory Management - Improved the Python adapter format bindings so that every `OrtValue` returned from adapter parameter getters is pinned to its owning C++ adapter object via pybind11's `keep_alive` mechanism. This prevents use-after-free errors if the parent `AdapterFormat` object is dropped while references to its parameters remain. The getter now builds the parameter dictionary on demand and avoids reference cycles that would leak memory. [[1]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL38-R56) [[2]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL85-R199) [[3]](diffhunk://#diff-f0e8ba8cb8cb07b51b3be675bf62cec07e2eae1461341ce5801d33a57c8f57fdR110-R113) ### Adapter Export Robustness - Enhanced the adapter export logic to reject string tensors, preventing the leaking of memory addresses and creation of unloadable adapter files. The export path now builds the adapter image entirely in memory before writing to disk, ensuring no partial files are left behind on error. ### Clean-up & Consistency - Simplified the construction and usage of the `PyAdapterFormatReaderWriter` class, ensuring that its internal state is only populated as appropriate for read or write operations, and removed unnecessary parameter passing. [[1]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL38-R56) [[2]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL128-R218) - Minor cleanup in property definitions and comments for clarity and maintainability. ### Regression Tests - Added thorough regression tests to verify that adapter parameter lifetimes are managed correctly and that exporting string tensors is properly rejected, with checks to ensure no files are created on failure. These changes collectively make adapter handling safer and more robust, especially when interacting with Python, and add critical safeguards against subtle memory and serialization bugs.

…oft#28761) ## Summary - Introduces `SessionBufferPool` that lets a session hold on to retired generator buffer caches (storage + uniform) and seed them into newly created generators. - Adds provider option `ep.webgpuexecutionprovider.sessionBufferPoolGenerations` to bound how many generations of retired buffers are kept (default `1`; set to `0` to disable). - Wires the WebGPU EP to donate a retiring `BufferManager`'s cache into the pool and absorb pooled buffers when a new `BufferManager` is created for the next generator. - The pool is only created when graph capture is enabled AND the option is > 0, so non-graph-capture sessions are unaffected. ## Motivation With graph capture enabled, each generator owns its own per-graph `BufferManager`. When the generator is destroyed (e.g., per-request in GenAI), the entire buffer cache is thrown away and the next generator must reallocate all storage and uniform buffers from scratch, increasing cold-start latency and GPU memory churn. By keeping a small pool of recently-retired buffer slots at the session level, the next generator can reuse them and skip reallocation entirely after the first cycle. ## Test plan - [x] Build ORT (Windows, D3D12) with ``--use_webgpu`` — clean build. - [x] ``lintrunner -a`` reports no lint issues. - [x] Verified end-to-end with GenAI on phi4 + WebGPU graph capture using two scripts: - ``verify_multi_gen.py``: sequential and overlapping generators all produce matching, coherent output. - ``verify_max_length_change.py``: generators with varying ``max_length`` all coherent. - [x] With diagnostic prints (since removed), confirmed that after the first generator donates buffers, subsequent generators report ``storage hits=171 misses=0, uniform hits=296 misses=0``, i.e., the pool actually engages and eliminates reallocation. ## Notes - Pairs with a GenAI-side change that invokes ``SessionReleaseCapturedGraph`` from ``State::~State()`` so the per-graph ``BufferManager`` is actually released and its buffers reach the pool.

### Description This pull request introduces a mechanism for exposing experimental C API functions in ONNX Runtime. The new system enables the addition, iteration, and eventual promotion of experimental APIs without impacting the stable ABI, using a name-based function pointer lookup and a generated header for type safety and ergonomics. The changes include documentation, build integration, header generation, implementation, and test coverage for the new experimental API flow. **Experimental C API Framework** * Added a design doc (`Experimental_C_API.md`) detailing the motivation, design decisions, and usage patterns for the experimental C API mechanism. * Introduced a central declaration file (`onnxruntime_experimental_c_api.inc`) using X-macros to define experimental API functions and their lifecycle rules. The X-macro signature uses `ORT_EXPERIMENTAL_API(VER, RET, NAME, ...)` ordering (return type before name) to match the convention used by `ORT_API_T` in the stable API. * Added a generated consumer header (`onnxruntime_experimental_c_api.h`) that provides C typedefs, name constants, and C++ typed accessors for experimental functions. Experimental function names follow the pattern `<TargetStruct>_<Name>_SinceV<APIVersion>` to unambiguously convey availability and avoid collision. * Updated the stable C API struct (`OrtApi` in `onnxruntime_c_api.h`) to include a single function pointer, `GetExperimentalFunction`, for name-based experimental function lookup. The `OrtExperimentalFnPtr` generic function pointer type (rather than `void*`) is used as the return type to avoid undefined behavior when casting between function pointers. * Integrated the new headers into the build system so they are installed and available to consumers. **Implementation and Test Coverage** * Implemented the runtime support for experimental API lookup and function registration (`experimental_c_api.cc`), including a test-only function (`OrtApi_ExperimentalApiTest`) to exercise the mechanism end-to-end. * Registered the new experimental API entry point in the exported API table (`onnxruntime_c_api.cc`). * Added a unit test source file for experimental API coverage. ### Motivation and Context Enable support for experimental C APIs. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

yuslepukhin and others added 4 commits June 9, 2026 14:31

Merge remote-tracking branch 'origin/master' into sync_msft_11062026

fc99498

ai-fw-intg requested review from Jaswanth51, ankitm3k, jatinwadhwa921 and vthaniel June 10, 2026 20:33

ankitm3k approved these changes Jun 11, 2026

View reviewed changes

ankitm3k merged commit 02294e5 into ovep-develop Jun 11, 2026
7 of 10 checks passed

ankitm3k deleted the sync_msft_11062026 branch June 11, 2026 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with Microsoft ONNX Runtime - 11062026#1130

Sync with Microsoft ONNX Runtime - 11062026#1130
ankitm3k merged 4 commits into
ovep-developfrom
sync_msft_11062026

ai-fw-intg commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ai-fw-intg commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants