Sync with Microsoft ONNX Runtime - 11062026#1130
Merged
Merged
Conversation
This pull request significantly improves the safety, correctness, and memory management of LoRA adapter handling in ONNX Runtime, especially around Python bindings and adapter file export/import. The main focus is on ensuring strong exception safety, preventing use-after-free bugs by improving object lifetimes, and rejecting unsupported tensor types during export. Additionally, comprehensive regression tests are added to guard against these issues. **Key changes include:** ### Exception Safety & Parameter Handling - Refactored `LoraAdapter::Load` and `MemoryMap` to provide a strong exception guarantee: all potentially-throwing operations are performed using local variables before committing to the object's state, ensuring no partial updates occur on failure. The new `BuildParamsValues` method builds the parameter map without side effects, replacing the old `InitializeParamsValues`. [[1]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L40-R68) [[2]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L85-R105) [[3]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L98-R115) [[4]](diffhunk://#diff-d810cdd06ed9beffd49380fafe9a3c1c2b438166fe14702ec2b78f8ae4ef0279L120-R137) [[5]](diffhunk://#diff-bd912c8889776d55e73fa4c6291385f7a55675c8885c350e96bdaf0e7187db51L154-R173) ### Python Bindings & Memory Management - Improved the Python adapter format bindings so that every `OrtValue` returned from adapter parameter getters is pinned to its owning C++ adapter object via pybind11's `keep_alive` mechanism. This prevents use-after-free errors if the parent `AdapterFormat` object is dropped while references to its parameters remain. The getter now builds the parameter dictionary on demand and avoids reference cycles that would leak memory. [[1]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL38-R56) [[2]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL85-R199) [[3]](diffhunk://#diff-f0e8ba8cb8cb07b51b3be675bf62cec07e2eae1461341ce5801d33a57c8f57fdR110-R113) ### Adapter Export Robustness - Enhanced the adapter export logic to reject string tensors, preventing the leaking of memory addresses and creation of unloadable adapter files. The export path now builds the adapter image entirely in memory before writing to disk, ensuring no partial files are left behind on error. ### Clean-up & Consistency - Simplified the construction and usage of the `PyAdapterFormatReaderWriter` class, ensuring that its internal state is only populated as appropriate for read or write operations, and removed unnecessary parameter passing. [[1]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL38-R56) [[2]](diffhunk://#diff-26fa08edf240764c8ed2e3e53a39af0e80798552989dd4f3c65f0e7cb0a6bf7dL128-R218) - Minor cleanup in property definitions and comments for clarity and maintainability. ### Regression Tests - Added thorough regression tests to verify that adapter parameter lifetimes are managed correctly and that exporting string tensors is properly rejected, with checks to ensure no files are created on failure. These changes collectively make adapter handling safer and more robust, especially when interacting with Python, and add critical safeguards against subtle memory and serialization bugs.
…oft#28761) ## Summary - Introduces `SessionBufferPool` that lets a session hold on to retired generator buffer caches (storage + uniform) and seed them into newly created generators. - Adds provider option `ep.webgpuexecutionprovider.sessionBufferPoolGenerations` to bound how many generations of retired buffers are kept (default `1`; set to `0` to disable). - Wires the WebGPU EP to donate a retiring `BufferManager`'s cache into the pool and absorb pooled buffers when a new `BufferManager` is created for the next generator. - The pool is only created when graph capture is enabled AND the option is > 0, so non-graph-capture sessions are unaffected. ## Motivation With graph capture enabled, each generator owns its own per-graph `BufferManager`. When the generator is destroyed (e.g., per-request in GenAI), the entire buffer cache is thrown away and the next generator must reallocate all storage and uniform buffers from scratch, increasing cold-start latency and GPU memory churn. By keeping a small pool of recently-retired buffer slots at the session level, the next generator can reuse them and skip reallocation entirely after the first cycle. ## Test plan - [x] Build ORT (Windows, D3D12) with ``--use_webgpu`` — clean build. - [x] ``lintrunner -a`` reports no lint issues. - [x] Verified end-to-end with GenAI on phi4 + WebGPU graph capture using two scripts: - ``verify_multi_gen.py``: sequential and overlapping generators all produce matching, coherent output. - ``verify_max_length_change.py``: generators with varying ``max_length`` all coherent. - [x] With diagnostic prints (since removed), confirmed that after the first generator donates buffers, subsequent generators report ``storage hits=171 misses=0, uniform hits=296 misses=0``, i.e., the pool actually engages and eliminates reallocation. ## Notes - Pairs with a GenAI-side change that invokes ``SessionReleaseCapturedGraph`` from ``State::~State()`` so the per-graph ``BufferManager`` is actually released and its buffers reach the pool.
### Description This pull request introduces a mechanism for exposing experimental C API functions in ONNX Runtime. The new system enables the addition, iteration, and eventual promotion of experimental APIs without impacting the stable ABI, using a name-based function pointer lookup and a generated header for type safety and ergonomics. The changes include documentation, build integration, header generation, implementation, and test coverage for the new experimental API flow. **Experimental C API Framework** * Added a design doc (`Experimental_C_API.md`) detailing the motivation, design decisions, and usage patterns for the experimental C API mechanism. * Introduced a central declaration file (`onnxruntime_experimental_c_api.inc`) using X-macros to define experimental API functions and their lifecycle rules. The X-macro signature uses `ORT_EXPERIMENTAL_API(VER, RET, NAME, ...)` ordering (return type before name) to match the convention used by `ORT_API_T` in the stable API. * Added a generated consumer header (`onnxruntime_experimental_c_api.h`) that provides C typedefs, name constants, and C++ typed accessors for experimental functions. Experimental function names follow the pattern `<TargetStruct>_<Name>_SinceV<APIVersion>` to unambiguously convey availability and avoid collision. * Updated the stable C API struct (`OrtApi` in `onnxruntime_c_api.h`) to include a single function pointer, `GetExperimentalFunction`, for name-based experimental function lookup. The `OrtExperimentalFnPtr` generic function pointer type (rather than `void*`) is used as the return type to avoid undefined behavior when casting between function pointers. * Integrated the new headers into the build system so they are installed and available to consumers. **Implementation and Test Coverage** * Implemented the runtime support for experimental API lookup and function registration (`experimental_c_api.cc`), including a test-only function (`OrtApi_ExperimentalApiTest`) to exercise the mechanism end-to-end. * Registered the new experimental API entry point in the exported API table (`onnxruntime_c_api.cc`). * Added a unit test source file for experimental API coverage. ### Motivation and Context Enable support for experimental C APIs. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ankitm3k
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.