Skip to content

Conversation

@ishwar-raut1
Copy link
Contributor

Description

Add vendor id to OrtEpFactory.

Motivation and Context

Have EP compatibility as per MSFT rules

@ishwar-raut1
Copy link
Contributor Author

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds vendor ID support to the ONNX Runtime TensorRT RTX execution provider factory to achieve EP compatibility with Microsoft rules. The changes primarily involve adding a new GetVendorId method implementation and updating function signatures to include noexcept specifications.

  • Adds GetVendorIdImpl method to return NVIDIA's vendor ID
  • Updates function signatures to include noexcept specifications for consistency
  • Initializes ort_version_supported field in the constructor


const OrtApi& ort_api;
const std::string ep_name;
const std::string ep_name{kNvTensorRTRTXExecutionProvider};
Copy link

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ep_name member initialization has changed from being set via constructor parameter to a hardcoded constant. This removes flexibility and may break existing code that relies on different EP names for different configurations, contradicting the comment on line 172-173 that states 'Each unique factory configuration must have a unique name.'

Copilot uses AI. Check for mistakes.
static const char* ORT_API_CALL GetVersionImpl(const OrtEpFactory* /*this_ptr*/) noexcept {
static uint32_t GetVendorIdImpl(const OrtEpFactory* this_ptr) noexcept {
const auto* factory = static_cast<const NvTensorRtRtxEpFactory*>(this_ptr);
return factory->vendor_id;
Copy link

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code references factory->vendor_id but this member variable is not defined in the visible class definition. This will cause a compilation error.

Copilot uses AI. Check for mistakes.
const char* ep_name,
OrtHardwareDeviceType hw_type)
: ort_api{ort_api_in}, ep_name{ep_name}, ort_hw_device_type{hw_type} {
ort_version_supported = ORT_API_VERSION;
Copy link

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assigns to ort_version_supported but this member variable is not defined in the visible class definition. This will cause a compilation error.

Copilot uses AI. Check for mistakes.
HectorSVC and others added 2 commits July 18, 2025 11:04
… to process EPContext node for ep_cache_context with bytes stream (microsoft#25389)

The existing API ReadOpAttr & CreateOpAttr for string type always assume there '\0' at the end. It blocks the EPs to embed/read the context binary byte buffer into EPContext node ep_cache_context attribute.
Update the customer op API ReadOpAttr for string type to avoid adding '\0' at the end.
Update CreateOpAttr API to construct the string with len.
Keep the strings type processing as it is for now.
)

Noticing intermittent timeouts around the 4hr mark but pipeline is
showing no errors. Increases timeout from 240min (4hr) to 270min (4.5hr)
nieubank
nieubank previously approved these changes Jul 18, 2025
dependabot bot added 2 commits July 18, 2025 13:06
)

Bumps [on-headers](https://github.com/jshttp/on-headers) and
[compression](https://github.com/expressjs/compression). These
dependencies needed to be updated together.
Updates `on-headers` from 1.0.2 to 1.1.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jshttp/on-headers/releases">on-headers's
releases</a>.</em></p>
<blockquote>
<h2>1.1.0</h2>
<h2>Important</h2>
<ul>
<li>Fix <a
href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a>
(<a
href="https://github.com/jshttp/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>Migrate CI pipeline to GitHub actions by <a
href="https://github.com/carpasse"><code>@​carpasse</code></a> in <a
href="https://github.com/jshttp/on-headers/pull/12">jshttp/on-headers#12</a></li>
<li>fix README.md badges by <a
href="https://github.com/carpasse"><code>@​carpasse</code></a> in <a
href="https://github.com/jshttp/on-headers/pull/13">jshttp/on-headers#13</a></li>
<li>add OSSF scorecard action by <a
href="https://github.com/carpasse"><code>@​carpasse</code></a> in <a
href="https://github.com/jshttp/on-headers/pull/14">jshttp/on-headers#14</a></li>
<li>fix: use <code>ubuntu-latest</code> as ci runner by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://github.com/jshttp/on-headers/pull/19">jshttp/on-headers#19</a></li>
<li>ci: apply OSSF Scorecard security best practices by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://github.com/jshttp/on-headers/pull/20">jshttp/on-headers#20</a></li>
<li>👷 add upstream change detection by <a
href="https://github.com/ctcpip"><code>@​ctcpip</code></a> in <a
href="https://github.com/jshttp/on-headers/pull/31">jshttp/on-headers#31</a></li>
<li>✨ add script to update known hashes by <a
href="https://github.com/ctcpip"><code>@​ctcpip</code></a> in <a
href="https://github.com/jshttp/on-headers/pull/32">jshttp/on-headers#32</a></li>
<li>💚 update CI - add newer node versions by <a
href="https://github.com/ctcpip"><code>@​ctcpip</code></a> in <a
href="https://github.com/jshttp/on-headers/pull/33">jshttp/on-headers#33</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/carpasse"><code>@​carpasse</code></a>
made their first contribution in <a
href="https://github.com/jshttp/on-headers/pull/12">jshttp/on-headers#12</a></li>
<li><a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
made their first contribution in <a
href="https://github.com/jshttp/on-headers/pull/19">jshttp/on-headers#19</a></li>
<li><a href="https://github.com/ctcpip"><code>@​ctcpip</code></a> made
their first contribution in <a
href="https://github.com/jshttp/on-headers/pull/31">jshttp/on-headers#31</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0">https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jshttp/on-headers/blob/master/HISTORY.md">on-headers's
changelog</a>.</em></p>
<blockquote>
<h1>1.1.0 / 2025-07-17</h1>
<ul>
<li>Fix <a
href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a>
(<a
href="https://github.com/jshttp/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/jshttp/on-headers/commit/4b017af88f5375bbdf3ad2ee732d2c122e4f52b0"><code>4b017af</code></a>
1.1.0</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/b636f2d08e6c1e0a784b53a13cd61e05c09bb118"><code>b636f2d</code></a>
♻️ refactor header array code</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/3e2c2d46c3e9592f6a1c3a3a1dbe622401f95d39"><code>3e2c2d4</code></a>
✨ ignore falsy header keys, matching node behavior</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/172eb41b99a5a290b27a2c43fe602ca33aa1c8ce"><code>172eb41</code></a>
✨ support duplicate headers</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/c6e384908c9c6127d18831d16ab0bd96e1231867"><code>c6e3849</code></a>
🔒️ fix array handling</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/6893518341bb4e5363285df086b3158302d3b216"><code>6893518</code></a>
💚 update CI - add newer node versions</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/56a345d82b51a0dcb8d09f061f87b1fd1dc4c01e"><code>56a345d</code></a>
✨ add script to update known hashes</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/175ab217155d525371a5416ff059f895a3a532a6"><code>175ab21</code></a>
👷 add upstream change detection (<a
href="https://github.com/jshttp/on-headers/issues/31">#31</a>)</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/ce0b2c8fcd313d38d3534fb731050dc16e105bf6"><code>ce0b2c8</code></a>
ci: apply OSSF Scorecard security best practices (<a
href="https://github.com/jshttp/on-headers/issues/20">#20</a>)</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/1a38c543e75cd06217b449531de10b1758e35299"><code>1a38c54</code></a>
fix: use <code>ubuntu-latest</code> as ci runner (<a
href="https://github.com/jshttp/on-headers/issues/19">#19</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~ulisesgascon">ulisesgascon</a>, a new
releaser for on-headers since your current version.</p>
</details>
<br />

Updates `compression` from 1.8.0 to 1.8.1
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/expressjs/compression/releases">compression's
releases</a>.</em></p>
<blockquote>
<h2>v1.8.1</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(docs): update multiple links from http to https by <a
href="https://github.com/Phillip9587"><code>@​Phillip9587</code></a> in
<a
href="https://github.com/expressjs/compression/pull/222">expressjs/compression#222</a></li>
<li>ci: add dependabot for github actions by <a
href="https://github.com/bjohansebas"><code>@​bjohansebas</code></a> in
<a
href="https://github.com/expressjs/compression/pull/207">expressjs/compression#207</a></li>
<li>build(deps): bump github/codeql-action from 2.23.2 to 3.28.15 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/228">expressjs/compression#228</a></li>
<li>build(deps): bump ossf/scorecard-action from 2.3.1 to 2.4.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/229">expressjs/compression#229</a></li>
<li>build(deps-dev): bump eslint-plugin-import from 2.26.0 to 2.31.0 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/230">expressjs/compression#230</a></li>
<li>build(deps-dev): bump supertest from 6.2.3 to 6.3.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/231">expressjs/compression#231</a></li>
<li>[StepSecurity] ci: Harden GitHub Actions by <a
href="https://github.com/step-security-bot"><code>@​step-security-bot</code></a>
in <a
href="https://github.com/expressjs/compression/pull/235">expressjs/compression#235</a></li>
<li>build(deps): bump github/codeql-action from 3.28.15 to 3.29.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/243">expressjs/compression#243</a></li>
<li>build(deps): bump actions/upload-artifact from 4.3.1 to 4.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/239">expressjs/compression#239</a></li>
<li>build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/240">expressjs/compression#240</a></li>
<li>build(deps): bump actions/checkout from 4.1.1 to 4.2.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/241">expressjs/compression#241</a></li>
<li>build(deps-dev): bump eslint-plugin-import from 2.31.0 to 2.32.0 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/expressjs/compression/pull/244">expressjs/compression#244</a></li>
<li>deps: [email protected] by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://github.com/expressjs/compression/pull/246">expressjs/compression#246</a></li>
<li>Release: 1.8.1 by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://github.com/expressjs/compression/pull/247">expressjs/compression#247</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
made their first contribution in <a
href="https://github.com/expressjs/compression/pull/228">expressjs/compression#228</a></li>
<li><a
href="https://github.com/step-security-bot"><code>@​step-security-bot</code></a>
made their first contribution in <a
href="https://github.com/expressjs/compression/pull/235">expressjs/compression#235</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/expressjs/compression/compare/1.8.0...v1.8.1">https://github.com/expressjs/compression/compare/1.8.0...v1.8.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/expressjs/compression/blob/master/HISTORY.md">compression's
changelog</a>.</em></p>
<blockquote>
<h1>1.8.1 / 2025-07-17</h1>
<ul>
<li>deps: on-headers@~1.1.0
<ul>
<li>Fix <a
href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a>
(<a
href="https://github.com/expressjs/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li>
</ul>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/expressjs/compression/commit/83a0c45fe190f4fcb8b515c18065db9cb9029dd1"><code>83a0c45</code></a>
1.8.1</li>
<li><a
href="https://github.com/expressjs/compression/commit/ce62713129f4b33eac4b833e1722410091646395"><code>ce62713</code></a>
deps: [email protected] (<a
href="https://github.com/expressjs/compression/issues/246">#246</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/f4acb23985fa345318d34d4a96acf555a883efeb"><code>f4acb23</code></a>
build(deps-dev): bump eslint-plugin-import from 2.31.0 to 2.32.0 (<a
href="https://github.com/expressjs/compression/issues/244">#244</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/6eaebe63f2ecac191d402c570bde140488435c4c"><code>6eaebe6</code></a>
build(deps): bump actions/checkout from 4.1.1 to 4.2.2 (<a
href="https://github.com/expressjs/compression/issues/241">#241</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/37e062312fd270f84b5f50f7c6f88312609633f5"><code>37e0623</code></a>
build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 (<a
href="https://github.com/expressjs/compression/issues/240">#240</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/bc436b26283c2f85a9711085dd0e4a580de50ba7"><code>bc436b2</code></a>
build(deps): bump actions/upload-artifact from 4.3.1 to 4.6.2 (<a
href="https://github.com/expressjs/compression/issues/239">#239</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/2f9f5726751ecf12f7c46a9d1493bcd1966e09a7"><code>2f9f572</code></a>
build(deps): bump github/codeql-action from 3.28.15 to 3.29.2 (<a
href="https://github.com/expressjs/compression/issues/243">#243</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/5f13b148d2a1a2daaa8647e03592214bb240bf18"><code>5f13b14</code></a>
[StepSecurity] ci: Harden GitHub Actions (<a
href="https://github.com/expressjs/compression/issues/235">#235</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/76e094548125afbf8089a482d5982dc96c7ce398"><code>76e0945</code></a>
build(deps-dev): bump supertest from 6.2.3 to 6.3.4 (<a
href="https://github.com/expressjs/compression/issues/231">#231</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/ae6ee809dc0cb40febaf2a5bff298465bd5a207f"><code>ae6ee80</code></a>
build(deps-dev): bump eslint-plugin-import from 2.26.0 to 2.31.0 (<a
href="https://github.com/expressjs/compression/issues/230">#230</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/expressjs/compression/compare/1.8.0...v1.8.1">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ts/transformers-test (microsoft#25429)

Bumps [transformers](https://github.com/huggingface/transformers) from
4.48.0 to 4.52.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>Patch release v4.51.3</h2>
<p>A mix of bugs were fixed in this patch; very exceptionally, we
diverge from semantic versioning to merge GLM-4 in this patch
release.</p>
<ul>
<li>Handle torch ver in flexattn (<a
href="https://github.com/huggingface/transformers/issues/37400">#37400</a>)</li>
<li>handle torch version edge cases (<a
href="https://github.com/huggingface/transformers/issues/37399">#37399</a>)</li>
<li>Add glm4 (<a
href="https://github.com/huggingface/transformers/issues/37388">#37388</a>)</li>
</ul>
<h1>Patch Release 4.51.2</h1>
<p>This is another round of bug fixes, but they are a lot more minor and
outputs were not really affected!</p>
<ul>
<li>Fix Llama4 offset (<a
href="https://github.com/huggingface/transformers/issues/37414">#37414</a>)
by <a
href="https://github.com/Cyrilvallez"><code>@​Cyrilvallez</code></a></li>
<li>Attention Quantization with FBGemm &amp; TP (<a
href="https://github.com/huggingface/transformers/issues/37384">#37384</a>)
by <a
href="https://github.com/MekkCyber"><code>@​MekkCyber</code></a></li>
<li>use rms_norm_eps for the L2Norm for Llama4 (<a
href="https://github.com/huggingface/transformers/issues/37418">#37418</a>)
by <a
href="https://github.com/danielhanchen"><code>@​danielhanchen</code></a></li>
<li>mark llama4 as not supported with fa2 (<a
href="https://github.com/huggingface/transformers/issues/37416">#37416</a>)
by <a
href="https://github.com/winglian"><code>@​winglian</code></a></li>
</ul>
<h1>Patch release v4.51.1</h1>
<p>Since the release of Llama 4, we have fixed a few issues that we are
now releasing in patch v4.51.1</p>
<ul>
<li>Fixing flex attention for torch=2.6.0 (<a
href="https://github.com/huggingface/transformers/issues/37285">#37285</a>)</li>
<li>more fixes for post-training llama4 (<a
href="https://github.com/huggingface/transformers/issues/37329">#37329</a>)</li>
<li>Remove HQQ from caching allocator warmup (<a
href="https://github.com/huggingface/transformers/issues/37347">#37347</a>)</li>
<li>fix derived berts _init_weights (<a
href="https://github.com/huggingface/transformers/issues/37341">#37341</a>)</li>
<li>Fix init empty weights without accelerate (<a
href="https://github.com/huggingface/transformers/issues/37337">#37337</a>)</li>
<li>Fix deepspeed with quantization (<a
href="https://github.com/huggingface/transformers/issues/37324">#37324</a>)</li>
<li>fix llama4 training (<a
href="https://github.com/huggingface/transformers/issues/37319">#37319</a>)</li>
<li>fix flex attn when optional args aren't passed (<a
href="https://github.com/huggingface/transformers/issues/37327">#37327</a>)</li>
<li>Multiple llama4 fixe (<a
href="https://github.com/huggingface/transformers/issues/37353">#37353</a>)</li>
</ul>
<p>Thanks all for your patience</p>
<h2>v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3</h2>
<h2>New Model Additions</h2>
<h3>Llama 4</h3>
<p><img
src="https://github.com/user-attachments/assets/d613b292-94b0-4902-9dc7-2d00693222e4"
alt="image" /></p>
<p>Llama 4, developed by Meta, introduces a new auto-regressive
Mixture-of-Experts (MoE) architecture.This generation includes two
models:</p>
<ul>
<li>The highly capable Llama 4 Maverick with 17B active parameters out
of ~400B total, with 128 experts.</li>
<li>The efficient Llama 4 Scout also has 17B active parameters out of
~109B total, using just 16 experts.</li>
</ul>
<p>Both models leverage early fusion for native multimodality, enabling
them to process text and image inputs. Maverick and Scout are both
trained on up to 40 trillion tokens on data encompassing 200 languages
(with specific fine-tuning support for 12 languages including Arabic,
Spanish, German, and Hindi).</p>
<p>For deployment, Llama 4 Scout is designed for accessibility, fitting
on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization,
while Maverick is available in BF16 and FP8 formats. These models are
released under the custom Llama 4 Community License Agreement, available
on the model repositories</p>
<p>Getting started with Llama 4 using transformers is straightforward.
Make sure you have transformers v4.51.0 or later installed:</p>
<pre><code>pip install -U transformers[hf_xet]
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/transformers/commit/945727948c1143a10ac6f7d811aa58bb0d126b5b"><code>9457279</code></a>
Release: v4.52.1</li>
<li><a
href="https://github.com/huggingface/transformers/commit/eaa301673a0a7a1a8c5d3f11c046d1592a7ae16b"><code>eaa3016</code></a>
Revert parallelism temporarily (<a
href="https://github.com/huggingface/transformers/issues/38240">#38240</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/b5f494632c0fff2527dd3140423408644a9b0076"><code>b5f4946</code></a>
Protect ParallelInterface</li>
<li><a
href="https://github.com/huggingface/transformers/commit/113424bcd53b92600f77d82f48add0a60fb41556"><code>113424b</code></a>
Release: v4.52.0</li>
<li><a
href="https://github.com/huggingface/transformers/commit/f834d368f6a21ed54188d9c96fbb9013b1d2c75f"><code>f834d36</code></a>
[gemma3] fix bidirectional attention mask (<a
href="https://github.com/huggingface/transformers/issues/38080">#38080</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2edb0e4b4dda8172d5628ca7497a4125f28bf6fc"><code>2edb0e4</code></a>
[mllama] fix loading and inference (<a
href="https://github.com/huggingface/transformers/issues/38223">#38223</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/390f153469dfdc793e7a9c7eb4822ea76f4f796a"><code>390f153</code></a>
Add padding-free to bamba (<a
href="https://github.com/huggingface/transformers/issues/35861">#35861</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2a79471318a9b7b16706f3bb5cd833c7e81919a6"><code>2a79471</code></a>
Fixing Bitnet after use_rms_norm introduction (<a
href="https://github.com/huggingface/transformers/issues/38229">#38229</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/9661896083c9d983341afa45cc4b84af01706e72"><code>9661896</code></a>
Enable Quantize KV Cache for Mistral Model (<a
href="https://github.com/huggingface/transformers/issues/35042">#35042</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/1c2f36b480e02c9027d2523746d34e27b39e01a4"><code>1c2f36b</code></a>
parallelism goes brrr (<a
href="https://github.com/huggingface/transformers/issues/37877">#37877</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.48.0...v4.52.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.48.0&new-version=4.52.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@jywu-msft jywu-msft changed the title Iraut/vendor id impl [NV RTX EP] Iraut/vendor id impl Jul 18, 2025
@jywu-msft jywu-msft added the ep:NvRTX NV RTX execution provider label Jul 18, 2025
guschmue and others added 5 commits July 18, 2025 15:44
add webgpu support for GatherBlockQuantized
### Description
<!-- Describe your changes. -->
Plugin EP data transfer and Stream support.

Add the ability for a plugin EP to provide an IDataTransfer
implementation and an OrtSyncStream implementation to do async data copy
outside of an inference session.

Example usage added for CUDA EP.

Caveat: Support for providing the OrtSyncStream from the data copy to
Session.Run will be a follow up PR. For the CUDA EP we can pass in the
native cudaStream_t from the OrtSyncStream used for the data copy to the
Run via CUDA EP provider options.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…soft#25446)

### Description
<!-- Describe your changes. -->
Set compute capability only on Turing arch


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Setting the native compute capability was causing a regression in
performance.

@gaugarg-nv @ishwar-raut1 @ankan-ban
@jywu-msft
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows x64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

keshavv27 and others added 4 commits July 18, 2025 22:40
…V EP Unit Tests (microsoft#25323)

### Description
Remove fast_gelu operator from the base model created in NV TRT RTX EP
unit tests.



### Motivation and Context
The operator was added in the model to partition the model into
subgraphs which can be assigned to NV TRT RTX EP and CUDA EP, which
supports fast_gelu. But CUDA EP is not built when building ORT with NV
TRT RTX EP hence the unit tests fail with unsupported op error.

@ishwar-raut1 @ankan-ban
### Description
<!-- Describe your changes. -->

The error is:
```
..2025-07-17 11:21:36.861835596 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running main_graph_11957213504832792607_0 node. Name:'CANNExecutionProvider_main_graph_11957213504832792607_0_0' Status Message: ~/code/onnxruntime/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. tensor.cc:57 CalculateTensorStorageSize Tensor shape.Size() must be >= 0

 [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running main_graph_11957213504832792607_0 node. Name:'CANNExecutionProvider_main_graph_11957213504832792607_0_0' Status Message: ~/code/onnxruntime/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. tensor.cc:57 CalculateTensorStorageSize Tensor shape.Size() must be >= 0
```
### Description
<!-- Describe your changes. -->
Add default logger to CreateEpFactories so a plugin EP can log errors
outside of an inference session.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…icrosoft#25411)

### Description
- Adds documentation to state that the data pointer for an `OrtValue`
owned by an `OrtGraph` is stable during the lifetime of the `OrtSession`
that owns the `OrtGraph`.
- Adds documentation to the ort_graph_to_proto.h utils to show how to
create a `onnx::GraphProto` with external initializers that actually
point to in-memory data (same approach used internally within ORT).



### Motivation and Context
Clarification of usage of new graph apis.
@jywu-msft
Copy link
Member

can you resolve merge conflict with #25456 ?

aparmp-quic and others added 5 commits July 21, 2025 00:17
…#25444)

Enable Conv Op and ConvTranspose Op with "auto_pad" param set as VALID

### Description
QNN_EP reject the Conv Op and ConvTranspose on HTP if "auto_pad" is
"VALID". This configuration is supported on HTP.



### Motivation and Context
To enable Conv and ConvTranspose op with auto_pad as "VALID" running on
NPU and prevent them from falling back to CPU.
### Description

optimize search for nodejs in CMake.

### Motivation and Context

The default behavior of CMake's `find_program()` is to search `/bin/`
folder before `$PATH`. This may cause a very old Node.js to be used.
…icrosoft#25407)

### Description
The `QnnEpFactory` implementation currently initializes the underlying
provider by passing the `backend_type` configuration as `htp`, causing
the provider to find the appropriate backend-library, and load it
relative to the OnnxRuntime library. But if EP's are distributed
separately from the OnnxRuntime library - a major benefit of the EP ABI
- then the backend-library may-well not be relative to the OnnxRuntime.
Having the `QnnEpFactory` implementation look for its associated runtime
relative to _itself_ would allow the implementation to bring its own
runtime - and that's what this PR enables.

If the `QnnEpFactory` implementation is co-located with the OnnxRuntime
library, then this is consistent with the existing behavior, but an
`QnnEpFactory` implementation that is shipped 'out-of-band' will use a
backend-relative to itself.

WinML has been using a version of this fix, and this PR is 'upstreaming'
the change.

### Motivation and Context
To support out-of-band distribution of EP's - enabled by the EP ABI work
- then EP's should accommodate finding dependencies relative to the EP
library, and not the OnnxRuntime library.

---------

Co-authored-by: George Wu <[email protected]>
@jywu-msft
Copy link
Member

something seems off with the merge?

@ishwar-raut1
Copy link
Contributor Author

Something went wrong with this request closing this.

@ishwar-raut1
Copy link
Contributor Author

Add vendor id to OrtEpFactory.
Create a new merge request
#25478

@snnn
Copy link
Contributor

snnn commented Jul 25, 2025

Hi there! We haven't cut the release branch for this version yet, so I'm removing the release:1.23.0 label for now to keep things tidy. Thanks so much for your contribution! We'll make sure this gets included when the release is prepared. 🤖

@ishwar-raut1 ishwar-raut1 deleted the iraut/VendorIdImpl branch August 18, 2025 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:NvRTX NV RTX execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.