Skip to content

[webgpu] refactor a few "context" classes#26602

Merged
fs-eire merged 4 commits intomainfrom
fs-eire/ex-compute-context
Nov 26, 2025
Merged

[webgpu] refactor a few "context" classes#26602
fs-eire merged 4 commits intomainfrom
fs-eire/ex-compute-context

Conversation

@fs-eire
Copy link
Copy Markdown
Contributor

@fs-eire fs-eire commented Nov 19, 2025

Description

This PR refactors a few "context" classes to make it clearer and support new features.

@fs-eire fs-eire force-pushed the fs-eire/ex-compute-context branch 2 times, most recently from 0a89439 to db4badb Compare November 21, 2025 02:11
@fs-eire fs-eire changed the title [webgpu] update ComputeContext to make it work with PrePack [webgpu] refactor a few "context" classes Nov 21, 2025
@fs-eire fs-eire force-pushed the fs-eire/ex-compute-context branch from db4badb to 19e2d51 Compare November 21, 2025 09:32
@fs-eire fs-eire marked this pull request as ready for review November 24, 2025 17:32
@fs-eire fs-eire requested review from Copilot and qjia7 November 25, 2025 00:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors WebGPU provider "context" classes to improve clarity, simplify initialization, and support new features like weight pre-packing. The main changes include:

  • Extracts ComputeContextBase as a base class that doesn't depend on OpKernelContext, enabling its use in pre-pack operations
  • Converts SplitKConfig from lazy initialization via static factory method to eager initialization via constructor
  • Adds PrePack and PrePackInternal infrastructure to WebGpuKernel for future weight optimization
  • Introduces WebGpuDevice constant to eliminate code duplication

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
webgpu_utils.h Changes SplitKConfig::GetSplitKConfig() static method to constructor, adds documentation
webgpu_utils.cc Implements constructor-based initialization for SplitKConfig
webgpu_context.h Changes split_k_config_ from std::optional to std::unique_ptr, updates method signatures to use ComputeContextBase, removes lazy initialization method
webgpu_context.cc Adds eager initialization of SplitKConfig in Initialize(), removes old lazy initialization method, updates method signatures
webgpu_kernel.h Adds PrePack and PrePackInternal methods, adds webgpu_context_ member
webgpu_kernel.cc Implements PrePack and default PrePackInternal, caches webgpu_context_ reference, updates ComputeContext construction
compute_context.h Extracts ComputeContextBase base class without OpKernelContext dependency, ComputeContext inherits from it
compute_context.cc Updates constructors for base/derived class pattern
transpose.h/cc Updates DoTranspose signature to accept ComputeContextBase&
conv.h Adds PrePackInternal override and transposed_kernel_ member for future implementation
conv.cc Implements placeholder PrePackInternal with TODO for weight transpose optimization
webgpu_execution_provider.cc Uses WebGpuDevice constant instead of inline construction
allocator.h Defines WebGpuDevice constant
allocator.cc Uses WebGpuDevice constant

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 25, 2025

@fs-eire I've opened a new pull request, #26650, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 25, 2025

@fs-eire I've opened a new pull request, #26651, to work on those changes. Once the pull request is ready, I'll request review from you.

…rnel (#26650)

### Description

Adds documentation for `PrePack` and `PrePackInternal` methods in
`webgpu_kernel.h`:

- **`PrePack`**: Documents the WebGPU-specific override behavior,
including the limitation that ORT does not pass prepacked weights to
non-CPU EPs (kernels must manage their own storage)
- **`PrePackInternal`**: Documents the virtual method's purpose,
invocation timing, parameter semantics, and default `is_packed = false`
behavior

### Motivation and Context

Addresses review feedback from #26602 requesting documentation for these
methods:
-
#26602 (comment)
-
#26602 (comment)

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fs-eire fs-eire merged commit 5c28c7e into main Nov 26, 2025
97 of 101 checks passed
@fs-eire fs-eire deleted the fs-eire/ex-compute-context branch November 26, 2025 03:18
jatinwadhwa921 pushed a commit to intel/onnxruntime that referenced this pull request Dec 3, 2025
* Fix npm audit vulnerabilities in /js directory (microsoft#26632)

### Description

Resolved all security vulnerabilities in JavaScript packages under `/js`
by running `npm audit fix`. All updates are non-breaking patch/minor
version bumps.

**Fixed vulnerabilities:**

- `/js` root: 1 high severity
  - `glob` 10.4.5 → 10.5.0 (command injection - GHSA-5j98-mcp5-4vw2)

- `/js/react_native`: 7 vulnerabilities (1 high, 3 moderate, 3 low)
- `image-size` → 1.2.1 (high: DoS via infinite loop -
GHSA-m5qc-5hw7-8vg7)
- `@babel/helpers` 7.25.6 → 7.28.4 (moderate: RegExp complexity -
GHSA-968p-4wvh-cqc8)
- `@babel/runtime` 7.25.6 → 7.28.4 (moderate: RegExp complexity -
GHSA-968p-4wvh-cqc8)
- `js-yaml` → fixed (moderate: prototype pollution -
GHSA-mh29-5h37-fv8m)
  - `brace-expansion` 2.0.1 → 2.0.2 (low: ReDoS - GHSA-v6h2-p8h4-qcjw)
- `on-headers` → fixed (low: header manipulation - GHSA-76c9-3jph-rj3q)

**Files modified:**
- `js/package-lock.json`
- `js/react_native/package-lock.json`

**Result:** All JS packages (`/js`, `/js/common`, `/js/web`, `/js/node`,
`/js/react_native`) now report 0 vulnerabilities.

### Motivation and Context

Security maintenance to address dependency vulnerabilities identified by
`npm audit`. No breaking changes or code modifications required.

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> Please create a pull request that runs `npm audit fix` for the
JavaScript/TypeScript portion of the repository under the `/js`
directory of
[microsoft/onnxruntime](https://github.com/microsoft/onnxruntime).
> 
> Requirements:
> 
> 1. **Scope**
> - Work only within the `/js` folder and its subpackages (e.g.,
`js/web`, `js/node`, `js/common`, etc.).
>    - Do not modify files outside `/js`.
> 
> 2. **Dependency updates**
> - Run `npm audit fix` (and, if necessary to fully resolve
high/critical issues while staying non-breaking, `npm audit fix --force`
on specific subpackages) to address security vulnerabilities.
> - Prefer minimal, non-breaking version bumps (patch and minor) that
satisfy `npm audit` while keeping semver ranges sensible.
> - If any **major** upgrades are required to clear vulnerabilities,
handle them cautiously:
> - Apply the upgrade only if tests still pass and typings/build setup
remain compatible.
> - If a major bump would require code changes or creates breaking
behavior, **do not** apply it; instead, leave a TODO comment in the PR
description summarizing which packages remain vulnerable and why.
> 
> 3. **Validation**
> - Run the existing JS-related checks that the repo supports from
`/js`, such as:
>      - `npm test` or package-specific test scripts.
> - Any documented lint/build/test commands for JS packages (e.g., `npm
run build`, `npm run lint`) where applicable.
> - Ensure the updated lockfiles (if present) are consistent, and the
project installs cleanly with `npm ci` (or the repo's documented install
command) in the `/js` area.
> 
> 4. **Files to update**
> - Update `package.json` and lockfiles under `/js` (e.g.,
`package-lock.json`, `npm-shrinkwrap.json`, or workspace-specific lock
files) to reflect the audited dependency tree.
> - Do not manually edit `node_modules`; rely on `npm` to manage
dependencies and only commit manifest/lockfile changes.
> 
> 5. **Repository conventions**
> - Follow this repo's existing conventions for formatting, commit
messages, and JS tooling.
> - Keep the diff focused on the dependency and lockfile updates plus
any absolutely necessary code tweaks to maintain compatibility.
> 
> 6. **Pull request description**
>    - In the PR body, include:
> - A short summary: that `npm audit fix` was run in `/js` to address
dependency vulnerabilities.
> - A bullet list of notable dependency changes (especially any major
version bumps), with packages and old/new versions.
>      - A brief testing summary (commands run and their results).
> - A note about any remaining vulnerabilities that could not be fixed
without breaking changes (if applicable), including the affected
packages and advisories if available.
> 
> The goal is a clean, minimal PR that improves the security posture of
the JS packages under `/js` in `microsoft/onnxruntime` without
introducing breaking changes.


</details>

*This pull request was created as a result of the following prompt from
Copilot chat.*
> Please create a pull request that runs `npm audit fix` for the
JavaScript/TypeScript portion of the repository under the `/js`
directory of
[microsoft/onnxruntime](https://github.com/microsoft/onnxruntime).
> 
> Requirements:
> 
> 1. **Scope**
> - Work only within the `/js` folder and its subpackages (e.g.,
`js/web`, `js/node`, `js/common`, etc.).
>    - Do not modify files outside `/js`.
> 
> 2. **Dependency updates**
> - Run `npm audit fix` (and, if necessary to fully resolve
high/critical issues while staying non-breaking, `npm audit fix --force`
on specific subpackages) to address security vulnerabilities.
> - Prefer minimal, non-breaking version bumps (patch and minor) that
satisfy `npm audit` while keeping semver ranges sensible.
> - If any **major** upgrades are required to clear vulnerabilities,
handle them cautiously:
> - Apply the upgrade only if tests still pass and typings/build setup
remain compatible.
> - If a major bump would require code changes or creates breaking
behavior, **do not** apply it; instead, leave a TODO comment in the PR
description summarizing which packages remain vulnerable and why.
> 
> 3. **Validation**
> - Run the existing JS-related checks that the repo supports from
`/js`, such as:
>      - `npm test` or package-specific test scripts.
> - Any documented lint/build/test commands for JS packages (e.g., `npm
run build`, `npm run lint`) where applicable.
> - Ensure the updated lockfiles (if present) are consistent, and the
project installs cleanly with `npm ci` (or the repo's documented install
command) in the `/js` area.
> 
> 4. **Files to update**
> - Update `package.json` and lockfiles under `/js` (e.g.,
`package-lock.json`, `npm-shrinkwrap.json`, or workspace-specific lock
files) to reflect the audited dependency tree.
> - Do not manually edit `node_modules`; rely on `npm` to manage
dependencies and only commit manifest/lockfile changes.
> 
> 5. **Repository conventions**
> - Follow this repo's existing conventions for formatting, commit
messages, and JS tooling.
> - Keep the diff focused on the dependency and lockfile updates plus
any absolutely necessary code tweaks to maintain compatibility.
> 
> 6. **Pull request description**
>    - In the PR body, include:
> - A short summary: that `npm audit fix` was run in `/js` to address
dependency vulnerabilities.
> - A bullet list of notable dependency changes (especially any major
version bumps), with packages and old/new versions.
>      - A brief testing summary (commands run and their results).
> - A note about any remaining vulnerabilities that could not be fixed
without breaking changes (if applicable), including the affected
packages and advisories if available.
> 
> The goal is a clean, minimal PR that improves the security posture of
the JS packages under `/js` in `microsoft/onnxruntime` without
introducing breaking changes.

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com>

* [webgpu] Optimize InstanceNormalization by removing redundant transpose (microsoft#26626)

### Description
<!-- Describe your changes. -->

This PR optimizes `InstanceNormalization` by removing redundant
transpose.

Given the implementation of `InstanceNormalization` for `NCHW` is more
effiencient, we don't need to add wrapper `Transpose` to make it run in
`NHWC`, which helps use to elide redundant transpose and improve
performance.

Testing on Lunar Lake shows about `~60%` performance improvement in
`InstanceNormalization` operations.

#### `InstanceNormalization` OP benchmark
The input tensor shape: `(1,32,1048576)`
The scale tensor shape: `(32)`
The B tensor shape: `(32)`

| time cost (ms) | baseline | opt | diff |
| ---------------- | -------- | ---- | ---- |
| Lunar Lake | 82.6 | 34.2 | 58% |

#### Model benchmark
| time cost (ms) | baseline | opt | diff |
| ---------------- | -------- | ---- | ---- |
| sd-turbo-vae-decoder-fp16-demo | 2437.6 | 1835.9 | 25% |

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Please see above

* [webgpu] refactor a few "context" classes (microsoft#26602)

### Description

This PR refactors a few "context" classes to make it clearer and support
new features.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

* Bump actions/checkout from 5 to 6 (microsoft#26641)

Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to
6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>v6-beta by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2298">actions/checkout#2298</a></li>
<li>update readme/changelog for v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2311">actions/checkout#2311</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5.0.0...v6.0.0">https://github.com/actions/checkout/compare/v5.0.0...v6.0.0</a></p>
<h2>v6-beta</h2>
<h2>What's Changed</h2>
<p>Updated persist-credentials to store the credentials under
<code>$RUNNER_TEMP</code> instead of directly in the local git
config.</p>
<p>This requires a minimum Actions Runner version of <a
href="https://github.com/actions/runner/releases/tag/v2.329.0">v2.329.0</a>
to access the persisted credentials for <a
href="https://docs.github.com/en/actions/tutorials/use-containerized-services/create-a-docker-container-action">Docker
container action</a> scenarios.</p>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V6.0.0</h2>
<ul>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
</ul>
<h2>V5.0.1</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.1</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/checkout/commit/1af3b93b6815bc44a9784bd300feb67ff0d1eeb3"><code>1af3b93</code></a>
update readme/changelog for v6 (<a
href="https://github.com/actions/checkout/issues/2311">#2311</a>)</li>
<li><a
href="https://github.com/actions/checkout/commit/71cf2267d89c5cb81562390fa70a37fa40b1305e"><code>71cf226</code></a>
v6-beta (<a
href="https://github.com/actions/checkout/issues/2298">#2298</a>)</li>
<li><a
href="https://github.com/actions/checkout/commit/069c6959146423d11cd0184e6accf28f9d45f06e"><code>069c695</code></a>
Persist creds to a separate file (<a
href="https://github.com/actions/checkout/issues/2286">#2286</a>)</li>
<li><a
href="https://github.com/actions/checkout/commit/ff7abcd0c3c05ccf6adc123a8cd1fd4fb30fb493"><code>ff7abcd</code></a>
Update README to include Node.js 24 support details and requirements (<a
href="https://github.com/actions/checkout/issues/2248">#2248</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/checkout/compare/v5...v6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=5&new-version=6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add LogEvaluationStart for ReplayGraph (microsoft#26645)

### Description
<!-- Describe your changes. -->

add LogEvaluationStart for ReplayGraph to match LogEvaluationStop

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

So by using ETW, could capture run time correctly

Co-authored-by: hualxie <hualxie@microsoft.com>

* add LogCompileModel to mark the session usage (microsoft#26646)

### Description
<!-- Describe your changes. -->

add LogCompileModel to mark the session usage as Compile because that
session will not be used for inference
We could also use it to log compile model parameters if needed

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

We are building a profiling tool for WinML and we want to differentiate
Compile session and inference session.

I think there are two ways to do it but I don't know which is better

microsoft#26646
microsoft#26647

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* [webgpu] Fix bug introduced by RoE (microsoft#26661)

Fix bug introduced by microsoft#26563 which used the wrong condition by accident
and results incorrect result in graph capture mode.

* [QNN-EP] Enable verbose and artifacts saving in onnxruntime_provider_test.exe (microsoft#26396)

### Description
<!-- Describe your changes. -->
- The change allows users to better debug unit tests by adding the
following environment variables:
    - `QNN_DUMP_ONNX`: Dump input onnx model
- `QNN_DUMP_JSON`: Dump json qnn graph with provider_option
`dump_json_qnn_graph`
- `QNN_DUMP_DLC`: Dump dlc with provider_option `qnn_ir_backend_path`
    - `QNN_VERBOSE`: Use the log level `ORT_LOGGING_LEVEL_VERBOSE`
- Developers can use the environment variables above to save the
artifacts of QNN-EP testcases to a directory named with
`<TestSuite>_<TestName>`
    ```
        .
├── QnnCPUBackendTests_BatchNorm2D_fp32 # RunQnnModelTest
│ ├── dumped_f32_model.onnx # float32 ONNX model
        │   ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc
        │   └── QNNExecutionProvider_QNN_XXXX_X_X.json
├── QnnHTPBackendTests_BatchNorm_FP16 # TestFp16ModelAccuracy
│ ├── dumped_f16_model.onnx # float16 ONNX model
│ ├── dumped_f32_model.onnx # float32 ONNX model
        │   ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc
        │   └── QNNExecutionProvider_QNN_XXXX_X_X.json
└── QnnHTPBackendTests_BatchNorm2D_U8U8S32 # TestQDQModelAccuracy
├── dumped_f32_model.onnx # float32 ONNX model
            ├── dumped_qdq_model.onnx                   # QDQ ONNX model
            ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc
            └── QNNExecutionProvider_QNN_XXXX_X_X.json

# All artifact files are placed under the current working directory from
which the test binary is invoked.
    ```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- The Json qnn graph/dlc are helpful for backend to debug
performance/accuracy issues
- By comparing the onnx and Json qnn graph/dlc, we can locate the issue
about graph manipulation.

* [webgpu] Use multiplication instead of pow if exponent is 2 (microsoft#26667)

### Description
More accurately compute Pow(2.0) on WebGPU EP.

Reproduction script:
```py
from onnx import helper, TensorProto
import onnxruntime as ort
import numpy as np

# 1. Create the ONNX model
# Define input and output
input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, [1, 1])
output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [1, 1])

# Create a constant tensor for the exponent (2.0)
exponent_tensor = helper.make_tensor('exponent', TensorProto.FLOAT, [], [2.0])
exponent_node = helper.make_node('Constant', [], ['exponent_out'], value=exponent_tensor)

# Create the Pow node
# Pow takes two inputs: Base (X) and Power (exponent_out)
pow_node = helper.make_node(
    'Pow',
    inputs=['X', 'exponent_out'],
    outputs=['Y'],
    name='PowNode'
)

# Create the graph
graph_def = helper.make_graph(
    [exponent_node, pow_node],
    'test-model',
    [input_info],
    [output_info]
)

# Create the model
model_def = helper.make_model(graph_def, producer_name='onnx-example')
opset = model_def.opset_import[0]
opset.version = 13 # Ensure opset version supports the operations

# 2. Convert model to string (bytes)
model_str = model_def.SerializeToString()

# 3. Prepare input data
np.random.seed(0)
input_data = np.array([[-2e3]], dtype=np.float32)

# 4. Run on CPUExecutionProvider
sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider'])
res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0]
print("CPU Result:", res_cpu)

# 5. Run on WebGpuExecutionProvider
sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider'])
res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0]
print("WebGPU Result:", res_webgpu)

# Compare results
diff = np.abs(res_cpu - res_webgpu)
max_diff = diff.max().item()
assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
print("Results match!")
```

currently produces
```
CPU Result: [[4.e+06]]
WebGPU Result: [[3.999999e+06]]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[1], [line 56](vscode-notebook-cell:?execution_count=1&line=56)
     54 diff = np.abs(res_cpu - res_webgpu)
     55 max_diff = diff.max().item()
---> [56](vscode-notebook-cell:?execution_count=1&line=56) assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
     57 print("Results match!")

AssertionError: Results do not match within tolerance! Max diff: 1.0
```

but with this PR:
```
CPU Result: [[4.e+06]]
WebGPU Result: [[4.e+06]]
Results match!
```

### Motivation and Context

Leads to downstream issues/inaccuracies for certain models, especially
those which have larger values to compute pow(x,2) for.

cc @guschmue

* Avoid creation of temporary protobuf object (microsoft#26681)

### Description
While profiling session creation time for large graphs (number of nodes,
not size of tensors), we noticed that the creations and subsequent
destructions of protobuf objects were the major hotspot. This PR avoids
its creation.

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

* Use `std::string_view` directly as key to `absl::flat_hash_map::find` (microsoft#26682)

### Description
Use `std::string_view` directly as key in `find` method of
`flat_hash_map`. This part of the absl documentation may provide further
insights:
https://abseil.io/docs/cpp/guides/container#heterogeneous-lookup


### Motivation and Context
We noticed this when profiling the session creation of large models (in
terms of the number of nodes).

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

* [webgpu] Convert i32 to u32 in uniforms (microsoft#26676)

In debug mode, `webgpu_context.cc:257 Run Uniform variable[5]
(head_size) data type mismatch in program
"SplitPackedQKVWithRotaryEmbeddingAndCopyKV", Expected: u32, Actual:
i32`. No issue in release mode.

Convert i32 to u32 to avoid this issue.

* [webgpu] Fix BatchNormalization ShapeInferenceError for 2D inputs (microsoft#26659)

### Description

Test model (happens with any 2D inputs):
[2191__visual_projection_visual_projection.1_BatchNormalization.onnx.zip](https://github.com/user-attachments/files/23758390/2191__visual_projection_visual_projection.1_BatchNormalization.onnx.zip)


Command:
```
python -c "import onnxruntime as ort; ort.InferenceSession('2191__visual_projection_visual_projection.1_BatchNormalization.onnx', providers=['WebGpuExecutionProvider'])"
```

Before (failure):
```
Op (BatchNormalization) [ShapeInferenceError] Tensor must have at least 3 dimensions to convert between channels first and channels last.
```

After (success):
```
(nothing, meaning success)
```

### Motivation and Context

This fixes BatchNormalization on WebGPU, matching CPU version.

cc @guschmue

* Clear cuda error on unsupported CudaMemPool test (microsoft#26629)

### Description
<!-- Describe your changes. -->
CudaMemPool test checks if it is supported in a given environment.
We need to clear the error not to affect subsequent tests.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Potential test failure.

* [QNN-EP] Include detailed error message in the returned status (microsoft#26546)

### Description
<!-- Describe your changes. -->
The original error message only shows: "Failed to setup QNN input
tensors for graph: <graph_name>"
This change adds more detailed error information by logging the failure
reason from
[SetupTensors](https://github.com/microsoft/onnxruntime/blob/ea55c160a36d658eae61a4c7aeda6cb55dd54dec/onnxruntime/core/providers/qnn/builder/qnn_model.cc#L386),
making it easier to debug issues.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
User requires detailed error logging for the ORT online context binary
generation.

* add support for int32_t in webgpu / slice (microsoft#26693)

fix for microsoft#26690

* [webgpu] Remove `global_id` and `workgroup_id` in gemm_utils.cc (microsoft#26662)

### Description
This patch replaces `global_id` and `workgroup_id` with
`logical_global_id` and `logical_workgroup_id` which are computed from
`workgroup_idx` and the dispatch workgroup sizes set in
`ProgramBase::SetDispatchGroupSize()`.



### Motivation and Context
We shouldn't use `global_id` or `workgroup_id` directly because the
dispatch workgroup sizes may be normalized in
`ProgramManager::NormalizeDispatchGroupSize()`.

* [webgpu] Correct definition of large numbers, fixes softmax(max_negative_number) in float32 (microsoft#26670)

### Description

The correct definition of the most negative number is
`-3.40282346638528e+38`, according to IEEE 754, but it is being
incorrectly registered inline as a truncated version `-3.402823e+38f`.

```py
>>> import numpy as np
>>> np.finfo(np.float32).min
np.float32(-3.4028235e+38)
>>> np.finfo(np.float32).min.item()
-3.4028234663852886e+38
```

For this reason, values less than this threshold were handled
incorrectly. While this may seem like a small/irrelevant detail, it's
essential in attention masking, where we do in fact use this value,
leading to large numerical errors down the line.


Reproduction:
```py
from onnx import helper, TensorProto
import onnxruntime as ort
import numpy as np

# 1. Create the ONNX model
# Define input and output
input_shape = [1, 2]
input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, input_shape)
output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, input_shape)

# Create the Softmax node
# Softmax takes one input: X
softmax_node = helper.make_node(
    'Softmax',
    inputs=['X'],
    outputs=['Y'],
    name='SoftmaxNode',
    axis=-1 # Default axis is -1, usually applied to the last dimension
)

# Create the graph
graph_def = helper.make_graph(
    [softmax_node],
    'test-model',
    [input_info],
    [output_info]
)

# Create the model
model_def = helper.make_model(graph_def, producer_name='onnx-example')
opset = model_def.opset_import[0]
opset.version = 13 # Ensure opset version supports the operations

# 2. Convert model to string (bytes)
model_str = model_def.SerializeToString()

# 3. Prepare input data
np.random.seed(0)
input_data = np.array(
[[-3.40282346638528e+38, -3.40282346638528e+38]]
# [[-3.4028234663852886e+38, -3.4028234663852886e+38]]
).astype(np.float32)
print(input_data.tolist())

# 4. Run on CPUExecutionProvider
sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider'])
res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0]
print("CPU Result:", res_cpu)

# 5. Run on WebGpuExecutionProvider
sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider'])
res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0]
print("WebGPU Result:", res_webgpu)

# Compare results
diff = np.abs(res_cpu - res_webgpu)
max_diff = diff.max().item()
print(diff)
print(f"Max diff: {max_diff}")
assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
print("Results match!")
```

Before:
```
[[-3.4028234663852886e+38, -3.4028234663852886e+38]]
CPU Result: [[0.5 0.5]]
WebGPU Result: [[0. 0.]]
[[0.5 0.5]]
Max diff: 0.5
AssertionError: Results do not match within tolerance! Max diff: 0.5
```

After:
```
[[-3.4028234663852886e+38, -3.4028234663852886e+38]]
CPU Result: [[0.5 0.5]]
WebGPU Result: [[0.5 0.5]]
[[0. 0.]]
Max diff: 0.0
Results match!
```

cc @guschmue

* [TRT/TRT RTX EP] Fix bug for missing outputs in the returning ComputeCapability/IndexedSubGraph (microsoft#26444)

### Description
For TRT EP's `GetCapability()`, in some case, the `GetSubGraph()` won't
add graph's output to the `ComputeCapability/IndexedSubGraph` returning
to ORT.

The issue if from following code:
````c++
...
if (node->GetOutputEdgesCount() > node->OutputDefs().size()) {
 ... // execute here
} else {
  ...
          if (graph_output_names.find(output->Name()) != graph_output_names.end()) {
            graph_outputs_to_add[output] = output_order; // missing this
          }
}
````

Update TRT RTX EP as well.

### Motivation and Context
microsoft#25373

* [ROCM] Remove docker, contrib ops, ci scripts related to ROCM EP (microsoft#26697)

### Description

This is follow up of microsoft#25181
to remove ROCM EP related files to avoid confusion.

Documents will be updated later.

### Motivation and Context

microsoft#26692

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Wenqin Yang <wenqin.yang@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: qti-hungjuiw <hungjuiw@qti.qualcomm.com>
Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com>
Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: chunghow-qti <chunghow@qti.qualcomm.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Jiawei Shao <jiawei.shao@intel.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Rohanjames1997 pushed a commit to Rohanjames1997/onnxruntime that referenced this pull request Dec 4, 2025
### Description

This PR refactors a few "context" classes to make it clearer and support
new features.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Sumit2318 pushed a commit that referenced this pull request Jan 6, 2026
### Description

This PR refactors a few "context" classes to make it clearer and support
new features.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants