Skip to content

Conversation

@ankitm3k
Copy link
Contributor

Description

This PR patches the features provided for this PR #25476, this provides a stable fix for the GPU plugin with upcoming OV toolkit v2025.2.1

jatinwadhwa921 and others added 30 commits February 24, 2025 18:49
 Changes to make sure to honor SessionOptions API Contract
* Fix flash attention for GQA (Phi4) (microsoft#23850)

### Description
This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause
appears to be
`k_start + capped_sg_id < seq_causal_length`
check. This is either because, 
a. seq_causal_length varies per lane, so the check becomes non uniform
control flow, which is having interactions with subgroupShuffle.
or 
b. The check itself is incorrect and is wiping out values of v based on
the source lane's seq_causal_length. While in actualness values of v
need to be causal as per the lane that is going to multiply it with qkt.

qkt is already causal because earlier values of qk for out of bounds k
are set to min_value, and exp(<-4) are 0.

This fix works by removing that causal check and relying on the qk being
wiped out earlier. The documentation for causality behavior for GQA is
missing to determine which of this reason is the true reason.

Prior to this prompts with sequence length > 16 < 32 or 1k would break
with Phi 4 but smaller prompts would work.
Tested on Intel Alderlake, Nvidia 4070.

* Model Builder API (microsoft#23223)

### Description
<!-- Describe your changes. -->
Supports creating a model programmatically using the ORT C or C++ API. 
Supports augmenting an existing model to add nodes.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Fix typo: change `Upample` to `Upsample`. (microsoft#23838)

### Description
<!-- Describe your changes. -->
Fixed a typo in function names related to the Upsample CUDA kernel.
Changed incorrect spelling Upample to Upsample across relevant
functions.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This change is necessary to maintain consistency and prevent potential
confusion caused by incorrect function names.

* [doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (microsoft#23848)

### Description
<!-- Describe your changes. -->
Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (microsoft#23856)

* Change the logic to generate the default ep context file name (microsoft#23788)

Change the logic to generate the default ep context file name

### Description
Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name.

* Make Nuget QNN package pipeline 1ES compliant (microsoft#23805)

### Description
Make
[QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES
compliant



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [js/common] allows using Uint16Array as data for float16 tensor (microsoft#23827)

### Description

Resolve microsoft#23817



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [js/webgpu] Reland the optimization of ConvTranspose (microsoft#23858)

This PR fixes the errors in the ConvTranspose optimization and adds
tests to ensure the correctness of the implementation.

* [OpenVINO] Fix a build warning (microsoft#23877)

### Description
Fix a warning with std::move usage



### Motivation and Context
Possibly allow building without --compile_no_warning_as_error flag

* Change gsl::byte to std::byte (microsoft#23872)

To be compatible with the latest GSL library. Without this fix we will
get:

```
onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead.
```

* Allow using extended minimal build for several EPs (microsoft#23834)

### Description

#### Background

From code search, the following EPs use
`onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()`
methods:
- CANN
- CUDA
- DML
- JS
- ROCM
- WebGPU

However, the source file that implements
`onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is
ON:
https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42

This means that all EPs mentioned above is not able to compile with
minimal build.

#### Solution

The excluded file `core/framework/fallback_cpu_capability.cc` cannot
build in minimal build because some of its dependencies are not included
in the minimal build. However, in extended minimal build mode, all
dependencies are available.

This PR looses the restrict and allows to compile this file when it is
extended minimal build. After this change, those EPs are able to compile
in extended minimal build.

* Add dawn to ThirdPartyNotices (microsoft#23876)

### Description

Add `dawn` to ThirdPartyNotices.

* Enable QNN EP weight sharing generation using public API (microsoft#23702)

### Description
Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton.

Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs.

* [QNN-EP]: Fix inference failures while running with htp_shared_memory (microsoft#23892)

### Description
When using the enable_htp_shared_memory feature, we see that the address
of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are
not freed leading to memory exhaustion.

### Motivation and Context
When using the enable_htp_shared_memory_allocator feature for QNN in
GenAI extensions, it leads to inference failures during the second
prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI
use cases.

Co-authored-by: Ashish Garg <[email protected]>

* Fix enable_pix_capture build for WebGPU (microsoft#23857)

The build option --enable_pix_capture is broken. This fixes the problem.

---------

Co-authored-by: wp <[email protected]>

* [WebGPU-EP Native] Add ReduceMean (microsoft#23860)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [WebGPU EP] introduce BiasAdd contrib op (microsoft#23861)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Dynamo export and improve benchmark script for SAM2 encoder (microsoft#23887)

### Description
* Add dynamo export for Sam2 image encoder
* Verify fp32 onnx model with CPU EP (to avoid error message from TRT
EP).
* Update benchmark script:
  - output ORT profiling
- output torch compiled code and unique kernel name for compiled kernel
  - add an option for nightly package installation
  - uninstall existing ort packages before installing

The node metadata of dynamo exported model can help mapping node in onnx
model back to pytorch modeling script. Currently, the graph optimization
is not done on dynamo exported model, so it is experimental right now.

### Motivation and Context

To support profiling of torch compiled CUDA kernel.

* [js/web] improve workaround for bundlers (microsoft#23902)

### Description
This PR improves the workaround for bundlers in onnxruntime-web.
Specifically, the following changes have been made:

- Use [this
workaround](xenova@9c50aa2)
as suggested by @xenova in
huggingface/transformers.js#1161 (comment)

- Use `url > "file:" && url < "file;"` instead of
`url.startsWith("file:")` to allow minifiers to remove dead code
correctly.

This change allows to remove unnecessary dependencies of file parsed
from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and
optimize code like `if("file://filepath.js".startsWith("file:"))
{do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser
usages.

Resolves huggingface/transformers.js#1161

* [webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (microsoft#23349)

### Description
This change restores the MatMulNBits workgroup size from (8, 8, 1) back
to (16, 8, 1) to resolve a performance regression observed on Intel
iGPUs during token generation (M=1).

### Motivation and Context
As above.

Signed-off-by: Jianhui Dai <[email protected]>

* [webgpu] support Pad operator (microsoft#23141)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [WebNN] Accept Float16Array for float16 data type if it is available (microsoft#23894)

Float16Array is now shipping and WebNN Chromium implementation has
accepted it. We should allow it in WebNN EP as well.

* Ensure that the 'cmake_minimum_required' is version 3.5 or greater (microsoft#23888)

### Description
CMake 4.0 release candidate 2.0 is available, and it cannot compile all
of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime
codebase that specify a `cmake_minimum_required` version of 3.0, and
CMake 4.0 has removed support for compatibility with CMake < 3.5 - the
following error is reported:

```
CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.
```

Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to
set that as a minimum version to fix the error. The root CMakeLists.txt
does ask for a minimum version of 3.28, so we could snap to that, but
I'm still ramping up on the build, so wanted to propose a minimally
sufficient fix.

### Motivation and Context
Being able to build with the latest CMake - when it ships - reduces the
barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to
leverage the latest and greatest tooling.

* WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP (microsoft#23898)

This PR removes the deprecated subgroups-f16 from WebGPU native and JS
EP, and also remove the unused deviceInfo in WebGPU JS EP.

* [JSEP/WebGPU] Fixed error in softmax dispatch. (microsoft#23906)

### Description
Fixed an error softmax dispatch



### Motivation and Context
Produce expected results for LlaMA model

* enable WebGPU EP in WebAssembly build (microsoft#23913)

### Description

This PR is the first step for migrating the webgpu backend of
onnxruntime-web from JSEP based to WebGPU EP based.

In this change, we enable building WebGPU EP in a wasm build (ie.
`--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build
flags should still keep previous behavior.

* Adding OpenVINO Windows CI Pipeline (microsoft#23919)

### Description
<!-- Describe your changes. -->

Enable an OpenVINO Windows CI pipeline. This includes:
- Downloading the OpenVINO toolkit for Windows from an external source.
- Setting up OpenVINO environment variables.
- Building the ONNX Runtime OpenVINO Execution Provider.
- Running unit tests.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

This change is required to run checks on precommit and commit in the
ONNX Runtime project. It ensures that the code is tested with the
OpenVINO toolkit on Windows, improving the reliability and compatibility
of the project.

* [WebGPU EP] SoftMax Implementation (microsoft#23538)

Increase coverage for WebGPU Op

* Exclude MAUI projects from GPU C# packaging builds (microsoft#23923)

### Description
<!-- Describe your changes. -->
Use 'desktop only' solution in GPU C# packaging builds. We don't need to
include any MAUI support for those builds.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Support all block sizes that are multiples of 32 for DP4A (microsoft#23907)

### Description
Simple change 
1. The DP4A shader actually supports all block sizes that are multiples
of 32, relaxing the restriction and making a small tweak to support
sizes other than 32.
2. Moved the shader to a separate file for maintainability.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Example custom op with output type inferencing (microsoft#23916)

### Description
<!-- Describe your changes. -->
Add example of a custom op that is required to do type inference for the
output type for the model load to work.
Also acts as an example of how to override an ONNX op with a custom
implementation.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23891

* Enabling L2+ Optimizations for EPs (microsoft#23517)

There are some requirements to modify the graph which are specific to
the EP/hardware.
ORT has the hardcoded EP list for optimizations but that can't scale and
it's hard be extended to enable EP custom optimizations.

Here is the prototype to enable L2+ optimizations for EPs (The original
overview is provided by @skottmckay) as well as the TRT EP
implementation for the ConstantFoldingDQ optimization.

Signatures for selection and optimization functions:
````
  - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)>
  - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)>
````
GetCapability

- call (new) provider bridge API to lookup pre-defined optimizer by name
and get selection function
- ComputeCapability.optimize_func, i.e. optimization function, would be
set by the optimizer to the function that does the optimization

- EP has to update the returning ComputeCapability to include the
optimization ComputeCapability in nodes_to_optimize. So that later ORT
can perform optimization/transformation accordingly.

GraphPartitioner

- After assigning the ComputeCapability to the EP and prior to Compile,
if the ComputeCapability has nodes_to_optimize, iterate that list
  - optimization function needs to be called with
    - a mutable Graph instance
    - the ComputeCapability for the individual optimization
    - the overall ComputeCapability so it can be updated

* fix binplace file in web pipeline (microsoft#23930)

* Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI pipeline (microsoft#23931)

* Fix ConvInteger handling of optional inputs. (microsoft#23935)

### Description
<!-- Describe your changes. -->
Fix ConvInteger handling of optional inputs. Need to check Exists() and
not just the number of inputs.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23927

* Updated ov version in pipeline (#595) (microsoft#23882)

### Description
This PR updates the OpenVINO version used in the pipeline from 2024.5.0
to 2025.0.0

Co-authored-by: jatinwadhwa921 <[email protected]>

* [AIX] External data handling (microsoft#23859)

### Description
In BE system, model tensor data coming from external file is not handled
properly.
This was found during the debugging of
(microsoft/onnxruntime-genai#1104)

This PR changes do the endianness conversion of data loaded from
external file in BE system.

* Create a packaging pipeline for a custom nuget package (microsoft#23918)

* Fix license in example test code. (microsoft#23936)

* replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (microsoft#23926)

### Description

`gsl::narrow` does not work in no exception build.
- use `onnxruntime::narrow` if necessary;
- or change to `static_cast` if it's obviously safe.

also apply the changes to usage of `gsl::narrow_cast`, which does not
apply checks.

* VCPKG improvement: set  VCPKG_OSX_DEPLOYMENT_TARGET (microsoft#23933)

### Description
1. Set  VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets
2. Enable VCPKG in more pipelines.

* Allow using a different version of flatbuffers when building with vcpkg (microsoft#23946)

### Description
Allow using a different version of flatbuffers when building with vcpkg,
so that users do not need to pin flatbuffer's version, which provides
more flexibility in the build process.

Delete utf8_range from the dependencies, because it is an indirect
dependency of protobuf, which is already included in the build process.
### Motivation and Context

* Make python package pipeline 1ES compliant (microsoft#23800)

### Description
Make [Python packaging
pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841)
1ES compliant



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Checklist

- [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless

* Delete ROCM Nuget Publishing Pipeline (microsoft#23948)

* Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (microsoft#23924)

Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp)
from 2.1.9 to 2.1.10.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's
releases</a>.</em></p>
<blockquote>
<h2>v2.1.10</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport <a
href="https://github.com/SixLabors/ImageSharp/issues/2859">#2859</a>
to release/2.1.x by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li>
<li>Backport <a
href="https://github.com/SixLabors/ImageSharp/issues/2701">#2701</a>
to 2.1.x [copy] by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a>
Set lang version</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a>
Missed cache action update</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a>
Use latest cache action</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a>
Merge pull request <a
href="https://github.com/SixLabors/ImageSharp/issues/2891">#2891</a>
from SixLabors/af/backport-2701</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a>
<a
href="https://github.com/SixLabors/ImageSharp/issues/2701">#2701</a>
to 2.1.x [copy]</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a>
Merge pull request <a
href="https://github.com/SixLabors/ImageSharp/issues/2890">#2890</a>
from SixLabors/af/backport-2859</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a>
try to fix LFS for *.BMP</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a>
8.0.x is not needed</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a>
Another attempt for a Linux-specific skip</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a>
Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li>
<li>Additional commits viewable in <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=SixLabors.ImageSharp&package-manager=nuget&previous-version=2.1.9&new-version=2.1.10)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: Jianhui Dai <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Sushanth Rajasankar <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Seungtaek Kim <[email protected]>
Co-authored-by: co63oc <[email protected]>
Co-authored-by: Jambay Kinley <[email protected]>
Co-authored-by: Hector Li <[email protected]>
Co-authored-by: Jian Chen <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Alessio Soldano <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Jie Chen <[email protected]>
Co-authored-by: wp <[email protected]>
Co-authored-by: Satya Kumar Jandhyala <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: Jianhui Dai <[email protected]>
Co-authored-by: xhcao <[email protected]>
Co-authored-by: Wanming Lin <[email protected]>
Co-authored-by: Mark Schofield <[email protected]>
Co-authored-by: jiangzhaoming <[email protected]>
Co-authored-by: Yi-Hong Lyu <[email protected]>
Co-authored-by: vraspar <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: saurabh <[email protected]>
Co-authored-by: Ranjit Ranjan <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Revert "Rebasing with msft commits"
This reverts commit 920ed58, reversing
changes made to a6cdf62.
This change allows for allocations made by the ov allocator to be
imported to other APIs that require base addresses to the original
device allocation.
preetha-intel and others added 20 commits July 16, 2025 10:42
* Fix the model copies and redefinitions for CPU fallback

* OV compatibility is not needed

---------

Co-authored-by: sfatimar <[email protected]>
* [webgpu] Update wgsl_templates README.md (microsoft#25336)

### Description
Fix a broken URL and numbering in the ordered list in README.md.

### Motivation and Context
See Above.

* [webgpu] Move the early return after copying for ScatterND (microsoft#25345)

### Description
For ScatterND, if the indices are empty (nothing to update), it becomes
a copy operation. So we should move the early return after copying.

* [EP ABI] Utility to serialize OrtGraph to GraphProto (microsoft#25292)

### Description
- Provides utility functions that serialize an `OrtGraph` to a
`GraphProto` or `ModelProto`.
- Header-only file that can be copied to a project that builds with ORT
and ONNX.
- Available in
[include/onnxruntime/core/providers/utils/ort_graph_to_proto.h](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-ort-graph-to-onnx-protobuf/include/onnxruntime/core/providers/utils/ort_graph_to_proto.h)
- Updates the `Node_GetSubgraphs` API function to also return the
attribute names associated with each subgraph. This is required to
determine which subgraph corresponds to a given attribute.
- Adds `Graph_GetNumOperatorSets` and `Graph_GetOperatorSets` API
functions to get the opset version for each domain.



### Motivation and Context
Provide a utility to facilitate porting of existing execution providers
to the new EP ABI. The utilities introduced by this PR convert an
`OrtGraph` into an ONNX protobuf representation, which some existing EPs
currently convert to their internal representation. Ideally, we would
prefer a more direct conversion from a `OrtGraph` to the EP's internal
representation, but this is a large effort. These utilities enable an
incremental transition.

* Update vcpkg.json: remove  optional-lite (microsoft#25339)

The library is not used. C++ itself already has std::optional.

* Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager (microsoft#25276)

### Description
<!-- Describe your changes. -->
This PR is to move buffer release or cache from OnRefresh to
ReleaseBuffer in BucketCacheManager.

### Motivation and Context
The OnRefresh is executed after a batch(16) ep runs and inside the batch
runs, the buffer can not be really reused which is a waste for gpu
buffer resources. This PR proposed a strightforward optimization that
release or cache the buffer early in ReleaseBuffer instead of OnRefresh
to improve the buffer cache or release efficiency which will improve the
peak and average GPU memory usage. The experimental result also shows a
reasonable memory optimization without perf regressions.

#### Phi3
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 3603.83 | 3127.05 | 7.17 | 139.50
Default Bucket with Early Release Optimization | 3534.77 (+1.92%) |
3073.97 (+1.70%) | 7.14 (+0.36%) | 140.01 (+0.36%)

#### Deepseek-R1
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 2089.03 | 1716.15 | 6.07 | 164.67
Default Bucket with Early Release Optimization | 2034.00 (+2.63%) |
1674.49 (+2.43%) | 6.09 (-0.20%) | 164.34 (-0.20%)

#### LLama3.2-1B
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 1736.03 | 1424.64 | 3.37 | 296.53
Default Bucket with Early Release Optimization | 1659.78 (+4.39%) |
1366.78 (+4.06%) | 3.41 (-1.09%) | 293.34 (-1.08%)

* [web] Fix "npm run pull:wasm" script (microsoft#25330)

### Description

following up for microsoft#25267

* [MLAS] DequantizeLinear int8/uint8 (microsoft#24818)

### Description
- Adds multithreaded vectorized implementations of DequantizeLinear for
int8 and uint8 inputs:
  - Intel SSE 2
  - ARM NEON
- All other architectures fallback to a multithreaded scalar reference
implementation (previous was not multithreaded).
- **Note**: only enabled if ORT is built for client/on-device workloads
(`ORT_CLIENT_PACKAGE_BUILD` is defined).

INT8 DequantizeLinear latency on Intel Core i9-10920X with 4 intra op
threads (SSE 2 implementation)

| Number of elements | Baseline latency (us) | Multithreaded+SIMD
latency (us) | Speedup |
| ----------------------- | ---------------------- |
------------------------------------ | ---------- |
| 10 K | 1 | 1 | 1 |
| 20 K | 2 | 2 | 1 |
| 40 K | 5 | 5 | 1 |
| 80 K | 11 | 4 | 2.75 |
| 100 K | 14 | 5 | 2.80 |
| 150 K | 21 | 7 | 3.00 |
| 200 K | 28 | 8 | 3.50 |
| 400 K | 68 | 15 | 4.53 |
| 600 K | 107 | 21 | 5.10 |
| 800 K | 142 | 28 | 5.07 |
| 1 M | 187 | 42 | 4.45 |
| 2 M | 376 | 102 | 3.69 |
| 4 M | 880 | 236 | 3.73 |
| 6 M | 1547 | 557 | 2.78 |
| 8 M | 2438 | 1097 | 2.22 |
| 10 M | 3192 | 1464 | 2.18 |
| 100 M | 38718 | 17733 | 2.18 |

INT8 DequantizeLinear latency on Snapdragon 8cx gen 3 @ 3.4GHz with 4
intra op threads (NEON implementation)

| Number of elements | Baseline latency (us) | Multithreaded+SIMD
latency (us) | Speedup |
| ----------------------- | ---------------------- |
------------------------------------ | ---------- |
| 10 K | 1 | 1 | 1 |
| 20 K | 1 | 1 | 1 |
| 40 K | 3 | 3 | 1 |
| 80 K | 7 | 4 | 1.75 |
| 100 K | 9 | 3 | 3.00 |
| 150 K | 14 | 5 | 2.80 |
| 200 K | 18 | 6 | 3.00 |
| 400 K | 38 | 10 | 3.80 |
| 600 K | 61 | 15 | 4.07 |
| 800 K | 76 | 19 | 4.00 |
| 1 M | 98 | 24 | 4.08 |
| 2 M | 204 | 48 | 4.25 |
| 4 M | 424 | 112 | 3.79 |
| 6 M | 677 | 384 | 1.76 |
| 8 M | 919 | 621 | 1.48 |
| 10 M | 1132 | 776 | 1.46 |
| 100 M | 11842 | 10566 | 1.12 |
### Motivation and Context
Improves latency of quantized QDQ models that with large DQs that
dominate the inference latency.

* [CPU] GQA supports head_sink input for smooth softmax (microsoft#25269)

### Description
It is an extension of [Smooth
Softmax](microsoft#21867) feature.
The difference is that each head has a learnable smooth factor that
adding to the denominator of softmax. The smooth factor is like an extra
element that joins the softmax.

The usage of the smooth factor in softmax is like the following:
```math
softmax_{i} = \frac{exp(x_{i})}{exp(s)+ \sum_{j} exp(x_{j})}
```

The head_sink is a float tensor with length of number of attention
heads. For h-th head, `head_sink[h]` is used as smooth factor s. When
head_sink is not provided, constant 0 is used as smooth factor s.

Changes:
- [x] Update operator spec to add an optional new input `head_sink`
- [x] Implement CPU (MLAS) kernel.
- [x] Update test_gqa_cpu.py to test it.

CUDA kernel will be updated later in a separate PR.

* Add PackageVersion parameter to NuGet packaging stage (microsoft#25315)

Fix: `Microsoft.ML.OnnxRuntime.Managed.nupkg` artifact from GPU pipeline
does not have package version.


![image](https://github.com/user-attachments/assets/4a6135ab-4774-4aa6-aeb1-d5b06948ba8f)

* [QNN EP] Fix pool with reshape name conflicts (microsoft#25332)

Naming conflicts when expand-pool2d-squeeze (implemented as reshape) logic is invoked during ONNX -> QNN op lowering. Model with multiple pool 1D ops would hit this issue.

* Added creation of QDQ for TopK node (microsoft#25309)

- Added TopK in registry.py so as to create QDQ nodes for the op
- Ensure that both the input and output quantization params are equal
- Added unit test to verify the creation of QDQ nodes for TopK

### Description:

Added support for creation of QDQ nodes for TopK when quantized with ORT static quantization tool

### Motivation and Context:

Currently there is support to form a node unit for TopK operator when QDQ nodes are present and both the input and output quantization params are equal. But there was no support to create QDQ nodes for TopK operator in the ORT static quantization tool

* [WebNN] Refactor webnn op input rank check and add validation for ops (microsoft#25185)

### Description
Development for webnn op input rank range check


### Motivation and Context
- refactor webnn op input rank check
- add validation for various ops 
- take `gemm` op as an example to perform inputs rank check of
decomposed ops

@Honry @fdwr PTAL

* Make TRT plugins optional (microsoft#25261)

### Description

The parser does no longer link agains the plugin library but also loads
it dynamic. Due to that I think we should also make the library optional
in ORT. @chilo-ms

* [EP ABI] Add Graph_GetGraphView API to get a OrtGraph from a subset of nodes (microsoft#25191)

Added an API that creates a sub-graph from a set of nodes in an
OrtGraph.
This API is needed in the GetCapability EP ABI porting when EP wants to
check whether a 'sub-graph' of the graph is supported by the hardware
backend.

* [webgpu] a few optimization to WGSL template (microsoft#25333)

### Description

This change is a follow up to microsoft#25130.

- consume duktape from vcpkg if --use_vcpkg is specified
- ~~add a Windows CI pipeline for dynamic WGSL template~~ (Will do in a
separate PR)
- upgrade wgsl-template package from 0.1.10 to 0.1.13
  - support adding contribop folder as input

* add --client_package_build option  (microsoft#25351)

add a build option to enable default options more appropriate for
client/on-device workloads.
initial use case will be to set the default thread pool allow_spinning
policy , which we want to default to 0/false for builds targeted for
client/on-device workloads.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* [WebNN] Fix bug in Float16Array availability check (microsoft#25354)

The `from` is not a property of `Float16Array` but an inherited
function, we can use `Float16Array['from']` to check if it is available.

* [EP ABI] Add Node_GetEpType API (microsoft#25350)

Add a new API `Node_GetEpType` to get the EP that the node is assigned
to run on.

This API is needed when porting the plugin TRT EP in `GetCapability`
where ep needs to know whether the subgraph(s) of the control flow node
is assigned to the ep and then to add this control flow op to the
support list.

* QNN-EP: DSPQueue Polling (microsoft#25361)

### Description
Enable DSP queue polling when performance profile is burst

* [QNN_EP] Implement Efficient Mode API (microsoft#25146)

### Description
 - Set context priority to low when workload type is Efficient
 - Set context priority to command line configured value if Default
 - Error out otherwise (invalid argument)

* Add Compile API to set the location for the context binary file (microsoft#25356)

Add Compile API ModelCompilationOptions_SetEpContextBinaryInformation to set the folder path and model name so that the EP can get the right place to dump the [model_name]_[ep].bin file.

* add build matrix for wgsl template (microsoft#25352)

### Description

Windows WebGPU CI: add build matrix for wgsl template

* [JSEP] Fix inputShape index OOB in slice.ts (microsoft#25364)

Use `inputShape.length - 1` instead of `inputShape.length` to avoid
out-of-bounds access.

* [webgpu] extend cast version to 23 (microsoft#25235)

* Fix a security warning (microsoft#18979)

Description (reference:
GHSA-5crp-9r3c-p9vr)
Newtonsoft.Json prior to version 13.0.1 is vulnerable to Insecure
Defaults due to improper handling of expressions with high nesting level
that lead to StackOverFlow exception or high CPU and RAM usage.
Exploiting this vulnerability results in Denial Of Service (DoS).

To mitigate the issue one either need to update Newtonsoft.Json to
13.0.1 or set MaxDepth parameter in the JsonSerializerSettings.
```
JsonConvert.DefaultSettings = () => new JsonSerializerSettings { MaxDepth = 128 };
```
This file is the only place using `JsonConvert`, so I blindly put this
fix and hope the warning will disappear.

* Fix AutoEpSelection and OrtEpLibrary tests when using AuthenticAMD (microsoft#24754)

* Missing datatype in assertion (microsoft#23578)

* [EP ABI] Update to use Node_GetEpName (microsoft#25363)

Change to use `Node_GetEpName` API name to avoid confusion.
For plugin EPs, the EP factory can use whatever name that registered
with ORT, so make the API name `Node_GetEpName` to align with
`OrtEpFactory.GetName.`

* Bump clang-format from 20.1.7 to 20.1.8 (microsoft#25381)

* Fix number of layers in Whisper export (microsoft#25375)

### Description

This PR fixes the number of hidden layers used during the export of
Whisper by always using the number of hidden layers in the decoder.

### Motivation and Context

Most of the Whisper models contain the same number of hidden layers in
the encoder and decoder. However, Whisper large v3 turbo contains 32
hidden layers in the encoder and only 4 hidden layers in the decoder.

This PR also fixes [this
issue](microsoft/onnxruntime-genai#1611).

* Bump transformers from 4.48.0 to 4.52.1 in /onnxruntime/python/tools/transformers/models/llama (microsoft#25328)

Bumps [transformers](https://github.com/huggingface/transformers) from
4.48.0 to 4.52.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>Patch release v4.51.3</h2>
<p>A mix of bugs were fixed in this patch; very exceptionally, we
diverge from semantic versioning to merge GLM-4 in this patch
release.</p>
<ul>
<li>Handle torch ver in flexattn (<a
href="https://github.com/huggingface/transformers/issues/37400">#37400</a>)</li>
<li>handle torch version edge cases (<a
href="https://github.com/huggingface/transformers/issues/37399">#37399</a>)</li>
<li>Add glm4 (<a
href="https://github.com/huggingface/transformers/issues/37388">#37388</a>)</li>
</ul>
<h1>Patch Release 4.51.2</h1>
<p>This is another round of bug fixes, but they are a lot more minor and
outputs were not really affected!</p>
<ul>
<li>Fix Llama4 offset (<a
href="https://github.com/huggingface/transformers/issues/37414">#37414</a>)
by <a
href="https://github.com/Cyrilvallez"><code>@​Cyrilvallez</code></a></li>
<li>Attention Quantization with FBGemm &amp; TP (<a
href="https://github.com/huggingface/transformers/issues/37384">#37384</a>)
by <a
href="https://github.com/MekkCyber"><code>@​MekkCyber</code></a></li>
<li>use rms_norm_eps for the L2Norm for Llama4 (<a
href="https://github.com/huggingface/transformers/issues/37418">#37418</a>)
by <a
href="https://github.com/danielhanchen"><code>@​danielhanchen</code></a></li>
<li>mark llama4 as not supported with fa2 (<a
href="https://github.com/huggingface/transformers/issues/37416">#37416</a>)
by <a
href="https://github.com/winglian"><code>@​winglian</code></a></li>
</ul>
<h1>Patch release v4.51.1</h1>
<p>Since the release of Llama 4, we have fixed a few issues that we are
now releasing in patch v4.51.1</p>
<ul>
<li>Fixing flex attention for torch=2.6.0 (<a
href="https://github.com/huggingface/transformers/issues/37285">#37285</a>)</li>
<li>more fixes for post-training llama4 (<a
href="https://github.com/huggingface/transformers/issues/37329">#37329</a>)</li>
<li>Remove HQQ from caching allocator warmup (<a
href="https://github.com/huggingface/transformers/issues/37347">#37347</a>)</li>
<li>fix derived berts _init_weights (<a
href="https://github.com/huggingface/transformers/issues/37341">#37341</a>)</li>
<li>Fix init empty weights without accelerate (<a
href="https://github.com/huggingface/transformers/issues/37337">#37337</a>)</li>
<li>Fix deepspeed with quantization (<a
href="https://github.com/huggingface/transformers/issues/37324">#37324</a>)</li>
<li>fix llama4 training (<a
href="https://github.com/huggingface/transformers/issues/37319">#37319</a>)</li>
<li>fix flex attn when optional args aren't passed (<a
href="https://github.com/huggingface/transformers/issues/37327">#37327</a>)</li>
<li>Multiple llama4 fixe (<a
href="https://github.com/huggingface/transformers/issues/37353">#37353</a>)</li>
</ul>
<p>Thanks all for your patience</p>
<h2>v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3</h2>
<h2>New Model Additions</h2>
<h3>Llama 4</h3>
<p><img
src="https://github.com/user-attachments/assets/d613b292-94b0-4902-9dc7-2d00693222e4"
alt="image" /></p>
<p>Llama 4, developed by Meta, introduces a new auto-regressive
Mixture-of-Experts (MoE) architecture.This generation includes two
models:</p>
<ul>
<li>The highly capable Llama 4 Maverick with 17B active parameters out
of ~400B total, with 128 experts.</li>
<li>The efficient Llama 4 Scout also has 17B active parameters out of
~109B total, using just 16 experts.</li>
</ul>
<p>Both models leverage early fusion for native multimodality, enabling
them to process text and image inputs. Maverick and Scout are both
trained on up to 40 trillion tokens on data encompassing 200 languages
(with specific fine-tuning support for 12 languages including Arabic,
Spanish, German, and Hindi).</p>
<p>For deployment, Llama 4 Scout is designed for accessibility, fitting
on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization,
while Maverick is available in BF16 and FP8 formats. These models are
released under the custom Llama 4 Community License Agreement, available
on the model repositories</p>
<p>Getting started with Llama 4 using transformers is straightforward.
Make sure you have transformers v4.51.0 or later installed:</p>
<pre><code>pip install -U transformers[hf_xet]
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/transformers/commit/945727948c1143a10ac6f7d811aa58bb0d126b5b"><code>9457279</code></a>
Release: v4.52.1</li>
<li><a
href="https://github.com/huggingface/transformers/commit/eaa301673a0a7a1a8c5d3f11c046d1592a7ae16b"><code>eaa3016</code></a>
Revert parallelism temporarily (<a
href="https://github.com/huggingface/transformers/issues/38240">#38240</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/b5f494632c0fff2527dd3140423408644a9b0076"><code>b5f4946</code></a>
Protect ParallelInterface</li>
<li><a
href="https://github.com/huggingface/transformers/commit/113424bcd53b92600f77d82f48add0a60fb41556"><code>113424b</code></a>
Release: v4.52.0</li>
<li><a
href="https://github.com/huggingface/transformers/commit/f834d368f6a21ed54188d9c96fbb9013b1d2c75f"><code>f834d36</code></a>
[gemma3] fix bidirectional attention mask (<a
href="https://github.com/huggingface/transformers/issues/38080">#38080</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2edb0e4b4dda8172d5628ca7497a4125f28bf6fc"><code>2edb0e4</code></a>
[mllama] fix loading and inference (<a
href="https://github.com/huggingface/transformers/issues/38223">#38223</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/390f153469dfdc793e7a9c7eb4822ea76f4f796a"><code>390f153</code></a>
Add padding-free to bamba (<a
href="https://github.com/huggingface/transformers/issues/35861">#35861</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2a79471318a9b7b16706f3bb5cd833c7e81919a6"><code>2a79471</code></a>
Fixing Bitnet after use_rms_norm introduction (<a
href="https://github.com/huggingface/transformers/issues/38229">#38229</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/9661896083c9d983341afa45cc4b84af01706e72"><code>9661896</code></a>
Enable Quantize KV Cache for Mistral Model (<a
href="https://github.com/huggingface/transformers/issues/35042">#35042</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/1c2f36b480e02c9027d2523746d34e27b39e01a4"><code>1c2f36b</code></a>
parallelism goes brrr (<a
href="https://github.com/huggingface/transformers/issues/37877">#37877</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.48.0...v4.52.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.48.0&new-version=4.52.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump ruff from 0.12.2 to 0.12.3 (microsoft#25382)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.12.2 to 0.12.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.12.3</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Support non-context-manager calls in
<code>B017</code> (<a
href="https://github.com/astral-sh/ruff/pull/19063">#19063</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH100</code>, <code>PTH106</code>, <code>PTH107</code>,
<code>PTH108</code>, <code>PTH110</code>, <code>PTH111</code>,
<code>PTH112</code>, <code>PTH113</code>, <code>PTH114</code>,
<code>PTH115</code>, <code>PTH117</code>, <code>PTH119</code>,
<code>PTH120</code> (<a
href="https://github.com/astral-sh/ruff/pull/19213">#19213</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH203</code>, <code>PTH204</code>, <code>PTH205</code> (<a
href="https://github.com/astral-sh/ruff/pull/18922">#18922</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-return</code>] Fix false-positive for variables used
inside nested functions in <code>RET504</code> (<a
href="https://github.com/astral-sh/ruff/pull/18433">#18433</a>)</li>
<li>Treat form feed as valid whitespace before a line continuation (<a
href="https://github.com/astral-sh/ruff/pull/19220">#19220</a>)</li>
<li>[<code>flake8-type-checking</code>] Fix syntax error introduced by
fix (<code>TC008</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19150">#19150</a>)</li>
<li>[<code>pyupgrade</code>] Keyword arguments in <code>super</code>
should suppress the <code>UP008</code> fix (<a
href="https://github.com/astral-sh/ruff/pull/19131">#19131</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>[<code>flake8-pyi</code>] Make example error out-of-the-box
(<code>PYI007</code>, <code>PYI008</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19103">#19103</a>)</li>
<li>[<code>flake8-simplify</code>] Make example error out-of-the-box
(<code>SIM116</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19111">#19111</a>)</li>
<li>[<code>flake8-type-checking</code>] Make example error
out-of-the-box (<code>TC001</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19151">#19151</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Make example error out-of-the-box
(<code>PTH210</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19189">#19189</a>)</li>
<li>[<code>pycodestyle</code>] Make example error out-of-the-box
(<code>E272</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19191">#19191</a>)</li>
<li>[<code>pycodestyle</code>] Make example not raise unnecessary
<code>SyntaxError</code> (<code>E114</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19190">#19190</a>)</li>
<li>[<code>pydoclint</code>] Make example error out-of-the-box
(<code>DOC501</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19218">#19218</a>)</li>
<li>[<code>pylint</code>, <code>pyupgrade</code>] Fix syntax errors in
examples (<code>PLW1501</code>, <code>UP028</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19127">#19127</a>)</li>
<li>[<code>pylint</code>] Update <code>missing-maxsplit-arg</code> docs
and error to suggest proper usage (<code>PLC0207</code>) (<a
href="https://github.com/astral-sh/ruff/pull/18949">#18949</a>)</li>
<li>[<code>flake8-bandit</code>] Make example error out-of-the-box
(<code>S412</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19241">#19241</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AlexWaygood"><code>@​AlexWaygood</code></a></li>
<li><a
href="https://github.com/BurntSushi"><code>@​BurntSushi</code></a></li>
<li><a href="https://github.com/Gankra"><code>@​Gankra</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a
href="https://github.com/LaBatata101"><code>@​LaBatata101</code></a></li>
<li><a
href="https://github.com/MatthewMckee4"><code>@​MatthewMckee4</code></a></li>
<li><a
href="https://github.com/MeGaGiGaGon"><code>@​MeGaGiGaGon</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a
href="https://github.com/NamelessGO"><code>@​NamelessGO</code></a></li>
<li><a
href="https://github.com/UnboundVariable"><code>@​UnboundVariable</code></a></li>
<li><a
href="https://github.com/abhijeetbodas2001"><code>@​abhijeetbodas2001</code></a></li>
<li><a href="https://github.com/carljm"><code>@​carljm</code></a></li>
<li><a
href="https://github.com/charliermarsh"><code>@​charliermarsh</code></a></li>
<li><a
href="https://github.com/chirizxc"><code>@​chirizxc</code></a></li>
<li><a
href="https://github.com/danparizher"><code>@​danparizher</code></a></li>
<li><a
href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li>
<li><a href="https://github.com/fdosani"><code>@​fdosani</code></a></li>
<li><a
href="https://github.com/github-actions"><code>@​github-actions</code></a></li>
<li><a
href="https://github.com/ibraheemdev"><code>@​ibraheemdev</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.12.3</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Support non-context-manager calls in
<code>B017</code> (<a
href="https://github.com/astral-sh/ruff/pull/19063">#19063</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH100</code>, <code>PTH106</code>, <code>PTH107</code>,
<code>PTH108</code>, <code>PTH110</code>, <code>PTH111</code>,
<code>PTH112</code>, <code>PTH113</code>, <code>PTH114</code>,
<code>PTH115</code>, <code>PTH117</code>, <code>PTH119</code>,
<code>PTH120</code> (<a
href="https://github.com/astral-sh/ruff/pull/19213">#19213</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH203</code>, <code>PTH204</code>, <code>PTH205</code> (<a
href="https://github.com/astral-sh/ruff/pull/18922">#18922</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-return</code>] Fix false-positive for variables used
inside nested functions in <code>RET504</code> (<a
href="https://github.com/astral-sh/ruff/pull/18433">#18433</a>)</li>
<li>Treat form feed as valid whitespace before a line continuation (<a
href="https://github.com/astral-sh/ruff/pull/19220">#19220</a>)</li>
<li>[<code>flake8-type-checking</code>] Fix syntax error introduced by
fix (<code>TC008</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19150">#19150</a>)</li>
<li>[<code>pyupgrade</code>] Keyword arguments in <code>super</code>
should suppress the <code>UP008</code> fix (<a
href="https://github.com/astral-sh/ruff/pull/19131">#19131</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>[<code>flake8-pyi</code>] Make example error out-of-the-box
(<code>PYI007</code>, <code>PYI008</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19103">#19103</a>)</li>
<li>[<code>flake8-simplify</code>] Make example error out-of-the-box
(<code>SIM116</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19111">#19111</a>)</li>
<li>[<code>flake8-type-checking</code>] Make example error
out-of-the-box (<code>TC001</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19151">#19151</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Make example error out-of-the-box
(<code>PTH210</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19189">#19189</a>)</li>
<li>[<code>pycodestyle</code>] Make example error out-of-the-box
(<code>E272</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19191">#19191</a>)</li>
<li>[<code>pycodestyle</code>] Make example not raise unnecessary
<code>SyntaxError</code> (<code>E114</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19190">#19190</a>)</li>
<li>[<code>pydoclint</code>] Make example error out-of-the-box
(<code>DOC501</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19218">#19218</a>)</li>
<li>[<code>pylint</code>, <code>pyupgrade</code>] Fix syntax errors in
examples (<code>PLW1501</code>, <code>UP028</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19127">#19127</a>)</li>
<li>[<code>pylint</code>] Update <code>missing-maxsplit-arg</code> docs
and error to suggest proper usage (<code>PLC0207</code>) (<a
href="https://github.com/astral-sh/ruff/pull/18949">#18949</a>)</li>
<li>[<code>flake8-bandit</code>] Make example error out-of-the-box
(<code>S412</code>) (<a
href="https://github.com/astral-sh/ruff/pull/19241">#19241</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/astral-sh/ruff/commit/5bc81f26c8a820835067280153a279658477ccf2"><code>5bc81f2</code></a>
Bump 0.12.3 (<a
href="https://github.com/astral-sh/ruff/issues/19279">#19279</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/6908e2682f14792898cb8f9e4d920021da022307"><code>6908e26</code></a>
Filter <code>ruff_linter::VERSION</code> out of SARIF output tests (<a
href="https://github.com/astral-sh/ruff/issues/19280">#19280</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/25c429556421ddd6f715f5aaf906610e0c564606"><code>25c4295</code></a>
[ty] Avoid stale diagnostics for open files diagnostic mode (<a
href="https://github.com/astral-sh/ruff/issues/19273">#19273</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/426fa4bb12d8c47185800ba14dd5b4e721fd2c29"><code>426fa4b</code></a>
[ty] Add signature help provider to playground (<a
href="https://github.com/astral-sh/ruff/issues/19276">#19276</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/b0b65c24ff01dc9095f17b3768cf2b9a336a5a8c"><code>b0b65c2</code></a>
[ty] Initial implementation of signature help provider (<a
href="https://github.com/astral-sh/ruff/issues/19194">#19194</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/08bc6d25899501d690c37a87d6da51951280dfc5"><code>08bc6d2</code></a>
Add simple integration tests for all output formats (<a
href="https://github.com/astral-sh/ruff/issues/19265">#19265</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/f2ae12bab33d80d52caa3047775371fca83f6e96"><code>f2ae12b</code></a>
[<code>flake8-return</code>] Fix false-positive for variables used
inside nested functio...</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/965f415212f4f9f3ef855b647d53e892e6913828"><code>965f415</code></a>
[ty] Add a <code>--quiet</code> mode (<a
href="https://github.com/astral-sh/ruff/issues/19233">#19233</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/83b5bbf004bf2e47dd4ca5c049930894856547f1"><code>83b5bbf</code></a>
Treat form feed as valid whitespace before a line continuation (<a
href="https://github.com/astral-sh/ruff/issues/19220">#19220</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/87f6f08ef53edc2cbe8632d612f6d4fd016fe2ff"><code>87f6f08</code></a>
[ty] Make <code>check_file</code> a salsa query (<a
href="https://github.com/astral-sh/ruff/issues/19255">#19255</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.12.2...0.12.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.12.2&new-version=0.12.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [QNN EP] Upgrade QNN to 2.36.1 (microsoft#25388)

### Description

Update Qnn default version to 2.36.1.250708

Co-authored-by: Jeff Kilpatrick <[email protected]>

* Add vendor id to OrtEpFactory and default ORT logger to CreateEpFactories (microsoft#25365)

### Description
<!-- Describe your changes. -->
Add vendor id to OrtEpFactory. It's easier to get the vendor id than
name on other platforms.
Update the selection policy to prefer match on vendor id with fallback
to vendor name.

Add default ORT logger to CreateEpFactories. 
The OrtEpFactory currently has no way to log informational messages or
issues.
CreateEp is given the session logger for use by the OrtEp instance so
that part of things is good.

Misc cleanups. Make usage of ORT_API2_STATUS and ORT_API_T consistent on
onnxruntime_ep_c_api.h.
See ort_version_supported in some EP factories where it was missed.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Vendor id is easier to match against OrtHardwareDevice when doing auto
EP selection.
OrtEpFactory should have a logger. 
Last chance to cleanup APIs before 1.23 release

* Bump lintrunner-adapters from 0.12.4 to 0.12.5 (microsoft#25380)

* [WebNN] Add rank range validation for rest ops (microsoft#25383)

- Add common rank range validation to base_op_builder.cc
- Handle specific rank range validation for rest ops
- Remove duplicated input_shape validation
- Fix some typos BTW

* Fix some test issues when WebGPU and DML are enabled in the same build (microsoft#25401)

### Description
<!-- Describe your changes. -->
Fix some test setups where both EPs being in the same build wasn't
expected.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Fix SigLIP casual mask bug (microsoft#25360)

### Description
<!-- Describe your changes. -->
SigLIP architecture inside the vision encoder should not use a causal
mask on the attention. This change will fix Phi 4 MM accuracy issues we
have seen.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* [CPU] GQA supports attention scores output (microsoft#25319)

### Description
1. Add optional output to CPU impl of GQA op for storing attention
scores (QK). Buffer is of shape (B, N, S, T) and can either be fp16 or
fp32, depending on the type of other inputs
2. Add `qk_output` attribute to GQA, which controls if attention scores
should be saved before or after softmax is applied
3. Add unit tests to cover this use case
4. Added asserts on other EPs if this feature is used

* [QNN-EP] Support GridSample of linear mode for ONNX opset 20+ (microsoft#25408)

[QNN-EP] Support GridSample of linear mode for ONNX opset 20+

* [QNN-EP] Update ScatterND op to reject only QNN-CPU (microsoft#25403)

Current limitation is more than necessary -- only reject when targeting QNN CPU.

* Fix 2 device discovery issues. (microsoft#25397)

### Description
<!-- Describe your changes. -->
Fix vendor and device id conversion from SetupApi info.
Detect Remote Display Adapter and skip. This results in a bogus device
appearing when you're connected to a machine using remote desktop.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [webgpu] fix Slice implementation (microsoft#25415)

### Description

Bugfix: crash when dim_value is 0



### Motivation and Context

Thanks to @skottmckay who found the bug.

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jianhui Dai <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Fei Chen <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: vraspar <[email protected]>
Co-authored-by: qti-yuduo <[email protected]>
Co-authored-by: Akupadhye <[email protected]>
Co-authored-by: Wang Ning <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: George Wu <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wanming Lin <[email protected]>
Co-authored-by: quic-calvnguy <[email protected]>
Co-authored-by: Hector Li <[email protected]>
Co-authored-by: Jie Chen <[email protected]>
Co-authored-by: xhcao <[email protected]>
Co-authored-by: Wei-Sheng Chin <[email protected]>
Co-authored-by: quic-hungjuiw <[email protected]>
Co-authored-by: Ian Hunter <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: Jeff Kilpatrick <[email protected]>
Co-authored-by: Jeff Kilpatrick <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Nenad Banfic <[email protected]>
Co-authored-by: derdeljan-msft <[email protected]>
…714)

* Added support for 2025.2 and SimplifiedLayerNormalization op

* [OVEP] Update OV version to 2025.2.0

* Revert "[OVEP] Update OV version to 2025.2.0"

This reverts commit d129250.
* update: Implement OV Plugin using factories

* fix: refactor plugin code

* fix: map ep_metadata to device type using "ov_device" key

* fix: block provider options for AppendExecutionProvider_V2 pass

* minor fix for linux

* Add OrtEpLibraryOv tests

* ovep: Support multiple devices (i.e. AUTO) passed to CreateIExecutionProvider

* CreateIExecutionProvider: comment out unused devices parameter

* ovep factory: Implement CreateDataTransfer to avoid crash in RegisterExecutionProviderLibrary

* update: Enable shared libs linker flags for linux & macos

* CreateIExecutionProvider: For some disallowed provider options, give better guidance

* Add PluginEp_CheckV2DisallowedProviderOptions test

* ovep: Add CreateProvider_V2 & call it from CreateIExecutionProvider

* disable data transfer for ovep

* minor fix for linux

* openvino_provider_factory: Add 'num_of_threads' to block_and_advise_entries

---------

Co-authored-by: Ryan Metcalfe <[email protected]>
…759)

* ov_factory: Use 'GPU_DEVICE_ID' property to match with ORT device_id

* clean up comment
@ankitm3k
Copy link
Contributor Author

Kindly review & merge this fix for upcoming ORT v1.23 release branch @jywu-msft @adrianlizarraga @HectorSVC @nieubank @mschofie

@jywu-msft
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@jywu-msft jywu-msft merged commit 9001123 into microsoft:main Jul 24, 2025
96 of 101 checks passed
@snnn
Copy link
Contributor

snnn commented Jul 25, 2025

Hi there! We haven't cut the release branch for this version yet, so I'm removing the release:1.23.0 label for now to keep things tidy. Thanks so much for your contribution! We'll make sure this gets included when the release is prepared. 🤖

RyanMetcalfeInt8 added a commit to RyanMetcalfeInt8/onnxruntime that referenced this pull request Jul 29, 2025
### Description
This PR patches the features provided for this PR
microsoft#25476, this provides a
stable fix for the GPU plugin with upcoming OV toolkit v2025.2.1

---------

Signed-off-by: Jianhui Dai <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: bfilipek <[email protected]>
Co-authored-by: jatinwadhwa921 <[email protected]>
Co-authored-by: n1harika <[email protected]>
Co-authored-by: sfatimar <[email protected]>
Co-authored-by: Jaskaran Singh Nagi <[email protected]>
Co-authored-by: Eric Crawford <[email protected]>
Co-authored-by: Sushanth Rajasankar <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Seungtaek Kim <[email protected]>
Co-authored-by: co63oc <[email protected]>
Co-authored-by: Jambay Kinley <[email protected]>
Co-authored-by: Hector Li <[email protected]>
Co-authored-by: Jian Chen <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Alessio Soldano <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Jie Chen <[email protected]>
Co-authored-by: wp <[email protected]>
Co-authored-by: Satya Kumar Jandhyala <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: Jianhui Dai <[email protected]>
Co-authored-by: xhcao <[email protected]>
Co-authored-by: Wanming Lin <[email protected]>
Co-authored-by: Mark Schofield <[email protected]>
Co-authored-by: jiangzhaoming <[email protected]>
Co-authored-by: Yi-Hong Lyu <[email protected]>
Co-authored-by: vraspar <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: saurabh <[email protected]>
Co-authored-by: Ranjit Ranjan <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jatinwadhwa921 <[email protected]>
Co-authored-by: Pallavi Gupta <[email protected]>
Co-authored-by: Nikolay Proshunin <[email protected]>
Co-authored-by: Preetha Veeramalai <[email protected]>
Co-authored-by: Javier Martinez <[email protected]>
Co-authored-by: Bartlomiej Filipek <[email protected]>
Co-authored-by: bopeng1234 <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
Co-authored-by: TejalKhade28 <[email protected]>
Co-authored-by: Vishnudas Thaniel S <[email protected]>
Co-authored-by: Yaru Du <[email protected]>
Co-authored-by: Ryan Metcalfe <[email protected]>
Co-authored-by: Dvoretckii, Mikhail <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Fei Chen <[email protected]>
Co-authored-by: qti-yuduo <[email protected]>
Co-authored-by: Akupadhye <[email protected]>
Co-authored-by: Wang Ning <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: George Wu <[email protected]>
Co-authored-by: quic-calvnguy <[email protected]>
Co-authored-by: Wei-Sheng Chin <[email protected]>
Co-authored-by: quic-hungjuiw <[email protected]>
Co-authored-by: Ian Hunter <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: Jeff Kilpatrick <[email protected]>
Co-authored-by: Jeff Kilpatrick <[email protected]>
Co-authored-by: Nenad Banfic <[email protected]>
Co-authored-by: derdeljan-msft <[email protected]>
Co-authored-by: Ryan Metcalfe <[email protected]>
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description
This PR patches the features provided for this PR
microsoft#25476, this provides a
stable fix for the GPU plugin with upcoming OV toolkit v2025.2.1

---------

Signed-off-by: Jianhui Dai <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: bfilipek <[email protected]>
Co-authored-by: jatinwadhwa921 <[email protected]>
Co-authored-by: n1harika <[email protected]>
Co-authored-by: sfatimar <[email protected]>
Co-authored-by: Jaskaran Singh Nagi <[email protected]>
Co-authored-by: Eric Crawford <[email protected]>
Co-authored-by: Sushanth Rajasankar <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Seungtaek Kim <[email protected]>
Co-authored-by: co63oc <[email protected]>
Co-authored-by: Jambay Kinley <[email protected]>
Co-authored-by: Hector Li <[email protected]>
Co-authored-by: Jian Chen <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Alessio Soldano <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Jie Chen <[email protected]>
Co-authored-by: wp <[email protected]>
Co-authored-by: Satya Kumar Jandhyala <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: Jianhui Dai <[email protected]>
Co-authored-by: xhcao <[email protected]>
Co-authored-by: Wanming Lin <[email protected]>
Co-authored-by: Mark Schofield <[email protected]>
Co-authored-by: jiangzhaoming <[email protected]>
Co-authored-by: Yi-Hong Lyu <[email protected]>
Co-authored-by: vraspar <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: saurabh <[email protected]>
Co-authored-by: Ranjit Ranjan <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jatinwadhwa921 <[email protected]>
Co-authored-by: Pallavi Gupta <[email protected]>
Co-authored-by: Nikolay Proshunin <[email protected]>
Co-authored-by: Preetha Veeramalai <[email protected]>
Co-authored-by: Javier Martinez <[email protected]>
Co-authored-by: Bartlomiej Filipek <[email protected]>
Co-authored-by: bopeng1234 <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
Co-authored-by: TejalKhade28 <[email protected]>
Co-authored-by: Vishnudas Thaniel S <[email protected]>
Co-authored-by: Yaru Du <[email protected]>
Co-authored-by: Ryan Metcalfe <[email protected]>
Co-authored-by: Dvoretckii, Mikhail <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Fei Chen <[email protected]>
Co-authored-by: qti-yuduo <[email protected]>
Co-authored-by: Akupadhye <[email protected]>
Co-authored-by: Wang Ning <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: George Wu <[email protected]>
Co-authored-by: quic-calvnguy <[email protected]>
Co-authored-by: Wei-Sheng Chin <[email protected]>
Co-authored-by: quic-hungjuiw <[email protected]>
Co-authored-by: Ian Hunter <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: Jeff Kilpatrick <[email protected]>
Co-authored-by: Jeff Kilpatrick <[email protected]>
Co-authored-by: Nenad Banfic <[email protected]>
Co-authored-by: derdeljan-msft <[email protected]>
Co-authored-by: Ryan Metcalfe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.