-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[NV RTX EP] Iraut/vendor id impl #25449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@chilo-ms @jywu-msft @gedoensmax @ankan-ban @gaugarg-nv to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds vendor ID support to the ONNX Runtime TensorRT RTX execution provider factory to achieve EP compatibility with Microsoft rules. The changes primarily involve adding a new GetVendorId method implementation and updating function signatures to include noexcept specifications.
- Adds
GetVendorIdImplmethod to return NVIDIA's vendor ID - Updates function signatures to include
noexceptspecifications for consistency - Initializes
ort_version_supportedfield in the constructor
|
|
||
| const OrtApi& ort_api; | ||
| const std::string ep_name; | ||
| const std::string ep_name{kNvTensorRTRTXExecutionProvider}; |
Copilot
AI
Jul 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ep_name member initialization has changed from being set via constructor parameter to a hardcoded constant. This removes flexibility and may break existing code that relies on different EP names for different configurations, contradicting the comment on line 172-173 that states 'Each unique factory configuration must have a unique name.'
| static const char* ORT_API_CALL GetVersionImpl(const OrtEpFactory* /*this_ptr*/) noexcept { | ||
| static uint32_t GetVendorIdImpl(const OrtEpFactory* this_ptr) noexcept { | ||
| const auto* factory = static_cast<const NvTensorRtRtxEpFactory*>(this_ptr); | ||
| return factory->vendor_id; |
Copilot
AI
Jul 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code references factory->vendor_id but this member variable is not defined in the visible class definition. This will cause a compilation error.
| const char* ep_name, | ||
| OrtHardwareDeviceType hw_type) | ||
| : ort_api{ort_api_in}, ep_name{ep_name}, ort_hw_device_type{hw_type} { | ||
| ort_version_supported = ORT_API_VERSION; |
Copilot
AI
Jul 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assigns to ort_version_supported but this member variable is not defined in the visible class definition. This will cause a compilation error.
… to process EPContext node for ep_cache_context with bytes stream (microsoft#25389) The existing API ReadOpAttr & CreateOpAttr for string type always assume there '\0' at the end. It blocks the EPs to embed/read the context binary byte buffer into EPContext node ep_cache_context attribute. Update the customer op API ReadOpAttr for string type to avoid adding '\0' at the end. Update CreateOpAttr API to construct the string with len. Keep the strings type processing as it is for now.
) Bumps [on-headers](https://github.com/jshttp/on-headers) and [compression](https://github.com/expressjs/compression). These dependencies needed to be updated together. Updates `on-headers` from 1.0.2 to 1.1.0 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/jshttp/on-headers/releases">on-headers's releases</a>.</em></p> <blockquote> <h2>1.1.0</h2> <h2>Important</h2> <ul> <li>Fix <a href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a> (<a href="https://github.com/jshttp/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li> </ul> <h2>What's Changed</h2> <ul> <li>Migrate CI pipeline to GitHub actions by <a href="https://github.com/carpasse"><code>@carpasse</code></a> in <a href="https://github.com/jshttp/on-headers/pull/12">jshttp/on-headers#12</a></li> <li>fix README.md badges by <a href="https://github.com/carpasse"><code>@carpasse</code></a> in <a href="https://github.com/jshttp/on-headers/pull/13">jshttp/on-headers#13</a></li> <li>add OSSF scorecard action by <a href="https://github.com/carpasse"><code>@carpasse</code></a> in <a href="https://github.com/jshttp/on-headers/pull/14">jshttp/on-headers#14</a></li> <li>fix: use <code>ubuntu-latest</code> as ci runner by <a href="https://github.com/UlisesGascon"><code>@UlisesGascon</code></a> in <a href="https://github.com/jshttp/on-headers/pull/19">jshttp/on-headers#19</a></li> <li>ci: apply OSSF Scorecard security best practices by <a href="https://github.com/UlisesGascon"><code>@UlisesGascon</code></a> in <a href="https://github.com/jshttp/on-headers/pull/20">jshttp/on-headers#20</a></li> <li>👷 add upstream change detection by <a href="https://github.com/ctcpip"><code>@ctcpip</code></a> in <a href="https://github.com/jshttp/on-headers/pull/31">jshttp/on-headers#31</a></li> <li>✨ add script to update known hashes by <a href="https://github.com/ctcpip"><code>@ctcpip</code></a> in <a href="https://github.com/jshttp/on-headers/pull/32">jshttp/on-headers#32</a></li> <li>💚 update CI - add newer node versions by <a href="https://github.com/ctcpip"><code>@ctcpip</code></a> in <a href="https://github.com/jshttp/on-headers/pull/33">jshttp/on-headers#33</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/carpasse"><code>@carpasse</code></a> made their first contribution in <a href="https://github.com/jshttp/on-headers/pull/12">jshttp/on-headers#12</a></li> <li><a href="https://github.com/UlisesGascon"><code>@UlisesGascon</code></a> made their first contribution in <a href="https://github.com/jshttp/on-headers/pull/19">jshttp/on-headers#19</a></li> <li><a href="https://github.com/ctcpip"><code>@ctcpip</code></a> made their first contribution in <a href="https://github.com/jshttp/on-headers/pull/31">jshttp/on-headers#31</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0">https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/jshttp/on-headers/blob/master/HISTORY.md">on-headers's changelog</a>.</em></p> <blockquote> <h1>1.1.0 / 2025-07-17</h1> <ul> <li>Fix <a href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a> (<a href="https://github.com/jshttp/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/jshttp/on-headers/commit/4b017af88f5375bbdf3ad2ee732d2c122e4f52b0"><code>4b017af</code></a> 1.1.0</li> <li><a href="https://github.com/jshttp/on-headers/commit/b636f2d08e6c1e0a784b53a13cd61e05c09bb118"><code>b636f2d</code></a> ♻️ refactor header array code</li> <li><a href="https://github.com/jshttp/on-headers/commit/3e2c2d46c3e9592f6a1c3a3a1dbe622401f95d39"><code>3e2c2d4</code></a> ✨ ignore falsy header keys, matching node behavior</li> <li><a href="https://github.com/jshttp/on-headers/commit/172eb41b99a5a290b27a2c43fe602ca33aa1c8ce"><code>172eb41</code></a> ✨ support duplicate headers</li> <li><a href="https://github.com/jshttp/on-headers/commit/c6e384908c9c6127d18831d16ab0bd96e1231867"><code>c6e3849</code></a> 🔒️ fix array handling</li> <li><a href="https://github.com/jshttp/on-headers/commit/6893518341bb4e5363285df086b3158302d3b216"><code>6893518</code></a> 💚 update CI - add newer node versions</li> <li><a href="https://github.com/jshttp/on-headers/commit/56a345d82b51a0dcb8d09f061f87b1fd1dc4c01e"><code>56a345d</code></a> ✨ add script to update known hashes</li> <li><a href="https://github.com/jshttp/on-headers/commit/175ab217155d525371a5416ff059f895a3a532a6"><code>175ab21</code></a> 👷 add upstream change detection (<a href="https://github.com/jshttp/on-headers/issues/31">#31</a>)</li> <li><a href="https://github.com/jshttp/on-headers/commit/ce0b2c8fcd313d38d3534fb731050dc16e105bf6"><code>ce0b2c8</code></a> ci: apply OSSF Scorecard security best practices (<a href="https://github.com/jshttp/on-headers/issues/20">#20</a>)</li> <li><a href="https://github.com/jshttp/on-headers/commit/1a38c543e75cd06217b449531de10b1758e35299"><code>1a38c54</code></a> fix: use <code>ubuntu-latest</code> as ci runner (<a href="https://github.com/jshttp/on-headers/issues/19">#19</a>)</li> <li>Additional commits viewable in <a href="https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~ulisesgascon">ulisesgascon</a>, a new releaser for on-headers since your current version.</p> </details> <br /> Updates `compression` from 1.8.0 to 1.8.1 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/expressjs/compression/releases">compression's releases</a>.</em></p> <blockquote> <h2>v1.8.1</h2> <h2>What's Changed</h2> <ul> <li>fix(docs): update multiple links from http to https by <a href="https://github.com/Phillip9587"><code>@Phillip9587</code></a> in <a href="https://github.com/expressjs/compression/pull/222">expressjs/compression#222</a></li> <li>ci: add dependabot for github actions by <a href="https://github.com/bjohansebas"><code>@bjohansebas</code></a> in <a href="https://github.com/expressjs/compression/pull/207">expressjs/compression#207</a></li> <li>build(deps): bump github/codeql-action from 2.23.2 to 3.28.15 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/228">expressjs/compression#228</a></li> <li>build(deps): bump ossf/scorecard-action from 2.3.1 to 2.4.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/229">expressjs/compression#229</a></li> <li>build(deps-dev): bump eslint-plugin-import from 2.26.0 to 2.31.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/230">expressjs/compression#230</a></li> <li>build(deps-dev): bump supertest from 6.2.3 to 6.3.4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/231">expressjs/compression#231</a></li> <li>[StepSecurity] ci: Harden GitHub Actions by <a href="https://github.com/step-security-bot"><code>@step-security-bot</code></a> in <a href="https://github.com/expressjs/compression/pull/235">expressjs/compression#235</a></li> <li>build(deps): bump github/codeql-action from 3.28.15 to 3.29.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/243">expressjs/compression#243</a></li> <li>build(deps): bump actions/upload-artifact from 4.3.1 to 4.6.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/239">expressjs/compression#239</a></li> <li>build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/240">expressjs/compression#240</a></li> <li>build(deps): bump actions/checkout from 4.1.1 to 4.2.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/241">expressjs/compression#241</a></li> <li>build(deps-dev): bump eslint-plugin-import from 2.31.0 to 2.32.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://github.com/expressjs/compression/pull/244">expressjs/compression#244</a></li> <li>deps: [email protected] by <a href="https://github.com/UlisesGascon"><code>@UlisesGascon</code></a> in <a href="https://github.com/expressjs/compression/pull/246">expressjs/compression#246</a></li> <li>Release: 1.8.1 by <a href="https://github.com/UlisesGascon"><code>@UlisesGascon</code></a> in <a href="https://github.com/expressjs/compression/pull/247">expressjs/compression#247</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] made their first contribution in <a href="https://github.com/expressjs/compression/pull/228">expressjs/compression#228</a></li> <li><a href="https://github.com/step-security-bot"><code>@step-security-bot</code></a> made their first contribution in <a href="https://github.com/expressjs/compression/pull/235">expressjs/compression#235</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/expressjs/compression/compare/1.8.0...v1.8.1">https://github.com/expressjs/compression/compare/1.8.0...v1.8.1</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/expressjs/compression/blob/master/HISTORY.md">compression's changelog</a>.</em></p> <blockquote> <h1>1.8.1 / 2025-07-17</h1> <ul> <li>deps: on-headers@~1.1.0 <ul> <li>Fix <a href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a> (<a href="https://github.com/expressjs/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li> </ul> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/expressjs/compression/commit/83a0c45fe190f4fcb8b515c18065db9cb9029dd1"><code>83a0c45</code></a> 1.8.1</li> <li><a href="https://github.com/expressjs/compression/commit/ce62713129f4b33eac4b833e1722410091646395"><code>ce62713</code></a> deps: [email protected] (<a href="https://github.com/expressjs/compression/issues/246">#246</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/f4acb23985fa345318d34d4a96acf555a883efeb"><code>f4acb23</code></a> build(deps-dev): bump eslint-plugin-import from 2.31.0 to 2.32.0 (<a href="https://github.com/expressjs/compression/issues/244">#244</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/6eaebe63f2ecac191d402c570bde140488435c4c"><code>6eaebe6</code></a> build(deps): bump actions/checkout from 4.1.1 to 4.2.2 (<a href="https://github.com/expressjs/compression/issues/241">#241</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/37e062312fd270f84b5f50f7c6f88312609633f5"><code>37e0623</code></a> build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 (<a href="https://github.com/expressjs/compression/issues/240">#240</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/bc436b26283c2f85a9711085dd0e4a580de50ba7"><code>bc436b2</code></a> build(deps): bump actions/upload-artifact from 4.3.1 to 4.6.2 (<a href="https://github.com/expressjs/compression/issues/239">#239</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/2f9f5726751ecf12f7c46a9d1493bcd1966e09a7"><code>2f9f572</code></a> build(deps): bump github/codeql-action from 3.28.15 to 3.29.2 (<a href="https://github.com/expressjs/compression/issues/243">#243</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/5f13b148d2a1a2daaa8647e03592214bb240bf18"><code>5f13b14</code></a> [StepSecurity] ci: Harden GitHub Actions (<a href="https://github.com/expressjs/compression/issues/235">#235</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/76e094548125afbf8089a482d5982dc96c7ce398"><code>76e0945</code></a> build(deps-dev): bump supertest from 6.2.3 to 6.3.4 (<a href="https://github.com/expressjs/compression/issues/231">#231</a>)</li> <li><a href="https://github.com/expressjs/compression/commit/ae6ee809dc0cb40febaf2a5bff298465bd5a207f"><code>ae6ee80</code></a> build(deps-dev): bump eslint-plugin-import from 2.26.0 to 2.31.0 (<a href="https://github.com/expressjs/compression/issues/230">#230</a>)</li> <li>Additional commits viewable in <a href="https://github.com/expressjs/compression/compare/1.8.0...v1.8.1">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ts/transformers-test (microsoft#25429) Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.0 to 4.52.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>Patch release v4.51.3</h2> <p>A mix of bugs were fixed in this patch; very exceptionally, we diverge from semantic versioning to merge GLM-4 in this patch release.</p> <ul> <li>Handle torch ver in flexattn (<a href="https://github.com/huggingface/transformers/issues/37400">#37400</a>)</li> <li>handle torch version edge cases (<a href="https://github.com/huggingface/transformers/issues/37399">#37399</a>)</li> <li>Add glm4 (<a href="https://github.com/huggingface/transformers/issues/37388">#37388</a>)</li> </ul> <h1>Patch Release 4.51.2</h1> <p>This is another round of bug fixes, but they are a lot more minor and outputs were not really affected!</p> <ul> <li>Fix Llama4 offset (<a href="https://github.com/huggingface/transformers/issues/37414">#37414</a>) by <a href="https://github.com/Cyrilvallez"><code>@Cyrilvallez</code></a></li> <li>Attention Quantization with FBGemm & TP (<a href="https://github.com/huggingface/transformers/issues/37384">#37384</a>) by <a href="https://github.com/MekkCyber"><code>@MekkCyber</code></a></li> <li>use rms_norm_eps for the L2Norm for Llama4 (<a href="https://github.com/huggingface/transformers/issues/37418">#37418</a>) by <a href="https://github.com/danielhanchen"><code>@danielhanchen</code></a></li> <li>mark llama4 as not supported with fa2 (<a href="https://github.com/huggingface/transformers/issues/37416">#37416</a>) by <a href="https://github.com/winglian"><code>@winglian</code></a></li> </ul> <h1>Patch release v4.51.1</h1> <p>Since the release of Llama 4, we have fixed a few issues that we are now releasing in patch v4.51.1</p> <ul> <li>Fixing flex attention for torch=2.6.0 (<a href="https://github.com/huggingface/transformers/issues/37285">#37285</a>)</li> <li>more fixes for post-training llama4 (<a href="https://github.com/huggingface/transformers/issues/37329">#37329</a>)</li> <li>Remove HQQ from caching allocator warmup (<a href="https://github.com/huggingface/transformers/issues/37347">#37347</a>)</li> <li>fix derived berts _init_weights (<a href="https://github.com/huggingface/transformers/issues/37341">#37341</a>)</li> <li>Fix init empty weights without accelerate (<a href="https://github.com/huggingface/transformers/issues/37337">#37337</a>)</li> <li>Fix deepspeed with quantization (<a href="https://github.com/huggingface/transformers/issues/37324">#37324</a>)</li> <li>fix llama4 training (<a href="https://github.com/huggingface/transformers/issues/37319">#37319</a>)</li> <li>fix flex attn when optional args aren't passed (<a href="https://github.com/huggingface/transformers/issues/37327">#37327</a>)</li> <li>Multiple llama4 fixe (<a href="https://github.com/huggingface/transformers/issues/37353">#37353</a>)</li> </ul> <p>Thanks all for your patience</p> <h2>v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3</h2> <h2>New Model Additions</h2> <h3>Llama 4</h3> <p><img src="https://github.com/user-attachments/assets/d613b292-94b0-4902-9dc7-2d00693222e4" alt="image" /></p> <p>Llama 4, developed by Meta, introduces a new auto-regressive Mixture-of-Experts (MoE) architecture.This generation includes two models:</p> <ul> <li>The highly capable Llama 4 Maverick with 17B active parameters out of ~400B total, with 128 experts.</li> <li>The efficient Llama 4 Scout also has 17B active parameters out of ~109B total, using just 16 experts.</li> </ul> <p>Both models leverage early fusion for native multimodality, enabling them to process text and image inputs. Maverick and Scout are both trained on up to 40 trillion tokens on data encompassing 200 languages (with specific fine-tuning support for 12 languages including Arabic, Spanish, German, and Hindi).</p> <p>For deployment, Llama 4 Scout is designed for accessibility, fitting on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization, while Maverick is available in BF16 and FP8 formats. These models are released under the custom Llama 4 Community License Agreement, available on the model repositories</p> <p>Getting started with Llama 4 using transformers is straightforward. Make sure you have transformers v4.51.0 or later installed:</p> <pre><code>pip install -U transformers[hf_xet] </tr></table> </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/transformers/commit/945727948c1143a10ac6f7d811aa58bb0d126b5b"><code>9457279</code></a> Release: v4.52.1</li> <li><a href="https://github.com/huggingface/transformers/commit/eaa301673a0a7a1a8c5d3f11c046d1592a7ae16b"><code>eaa3016</code></a> Revert parallelism temporarily (<a href="https://github.com/huggingface/transformers/issues/38240">#38240</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/b5f494632c0fff2527dd3140423408644a9b0076"><code>b5f4946</code></a> Protect ParallelInterface</li> <li><a href="https://github.com/huggingface/transformers/commit/113424bcd53b92600f77d82f48add0a60fb41556"><code>113424b</code></a> Release: v4.52.0</li> <li><a href="https://github.com/huggingface/transformers/commit/f834d368f6a21ed54188d9c96fbb9013b1d2c75f"><code>f834d36</code></a> [gemma3] fix bidirectional attention mask (<a href="https://github.com/huggingface/transformers/issues/38080">#38080</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/2edb0e4b4dda8172d5628ca7497a4125f28bf6fc"><code>2edb0e4</code></a> [mllama] fix loading and inference (<a href="https://github.com/huggingface/transformers/issues/38223">#38223</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/390f153469dfdc793e7a9c7eb4822ea76f4f796a"><code>390f153</code></a> Add padding-free to bamba (<a href="https://github.com/huggingface/transformers/issues/35861">#35861</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/2a79471318a9b7b16706f3bb5cd833c7e81919a6"><code>2a79471</code></a> Fixing Bitnet after use_rms_norm introduction (<a href="https://github.com/huggingface/transformers/issues/38229">#38229</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/9661896083c9d983341afa45cc4b84af01706e72"><code>9661896</code></a> Enable Quantize KV Cache for Mistral Model (<a href="https://github.com/huggingface/transformers/issues/35042">#35042</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/1c2f36b480e02c9027d2523746d34e27b39e01a4"><code>1c2f36b</code></a> parallelism goes brrr (<a href="https://github.com/huggingface/transformers/issues/37877">#37877</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.48.0...v4.52.1">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
add webgpu support for GatherBlockQuantized
### Description <!-- Describe your changes. --> Plugin EP data transfer and Stream support. Add the ability for a plugin EP to provide an IDataTransfer implementation and an OrtSyncStream implementation to do async data copy outside of an inference session. Example usage added for CUDA EP. Caveat: Support for providing the OrtSyncStream from the data copy to Session.Run will be a follow up PR. For the CUDA EP we can pass in the native cudaStream_t from the OrtSyncStream used for the data copy to the Run via CUDA EP provider options. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…soft#25446) ### Description <!-- Describe your changes. --> Set compute capability only on Turing arch ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Setting the native compute capability was causing a regression in performance. @gaugarg-nv @ishwar-raut1 @ankan-ban
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows x64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
…V EP Unit Tests (microsoft#25323) ### Description Remove fast_gelu operator from the base model created in NV TRT RTX EP unit tests. ### Motivation and Context The operator was added in the model to partition the model into subgraphs which can be assigned to NV TRT RTX EP and CUDA EP, which supports fast_gelu. But CUDA EP is not built when building ORT with NV TRT RTX EP hence the unit tests fail with unsupported op error. @ishwar-raut1 @ankan-ban
### Description <!-- Describe your changes. --> The error is: ``` ..2025-07-17 11:21:36.861835596 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running main_graph_11957213504832792607_0 node. Name:'CANNExecutionProvider_main_graph_11957213504832792607_0_0' Status Message: ~/code/onnxruntime/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. tensor.cc:57 CalculateTensorStorageSize Tensor shape.Size() must be >= 0 [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running main_graph_11957213504832792607_0 node. Name:'CANNExecutionProvider_main_graph_11957213504832792607_0_0' Status Message: ~/code/onnxruntime/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. tensor.cc:57 CalculateTensorStorageSize Tensor shape.Size() must be >= 0 ```
### Description <!-- Describe your changes. --> Add default logger to CreateEpFactories so a plugin EP can log errors outside of an inference session. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…icrosoft#25411) ### Description - Adds documentation to state that the data pointer for an `OrtValue` owned by an `OrtGraph` is stable during the lifetime of the `OrtSession` that owns the `OrtGraph`. - Adds documentation to the ort_graph_to_proto.h utils to show how to create a `onnx::GraphProto` with external initializers that actually point to in-memory data (same approach used internally within ORT). ### Motivation and Context Clarification of usage of new graph apis.
|
can you resolve merge conflict with #25456 ? |
…#25444) Enable Conv Op and ConvTranspose Op with "auto_pad" param set as VALID ### Description QNN_EP reject the Conv Op and ConvTranspose on HTP if "auto_pad" is "VALID". This configuration is supported on HTP. ### Motivation and Context To enable Conv and ConvTranspose op with auto_pad as "VALID" running on NPU and prevent them from falling back to CPU.
### Description optimize search for nodejs in CMake. ### Motivation and Context The default behavior of CMake's `find_program()` is to search `/bin/` folder before `$PATH`. This may cause a very old Node.js to be used.
…icrosoft#25407) ### Description The `QnnEpFactory` implementation currently initializes the underlying provider by passing the `backend_type` configuration as `htp`, causing the provider to find the appropriate backend-library, and load it relative to the OnnxRuntime library. But if EP's are distributed separately from the OnnxRuntime library - a major benefit of the EP ABI - then the backend-library may-well not be relative to the OnnxRuntime. Having the `QnnEpFactory` implementation look for its associated runtime relative to _itself_ would allow the implementation to bring its own runtime - and that's what this PR enables. If the `QnnEpFactory` implementation is co-located with the OnnxRuntime library, then this is consistent with the existing behavior, but an `QnnEpFactory` implementation that is shipped 'out-of-band' will use a backend-relative to itself. WinML has been using a version of this fix, and this PR is 'upstreaming' the change. ### Motivation and Context To support out-of-band distribution of EP's - enabled by the EP ABI work - then EP's should accommodate finding dependencies relative to the EP library, and not the OnnxRuntime library. --------- Co-authored-by: George Wu <[email protected]>
|
something seems off with the merge? |
|
Something went wrong with this request closing this. |
|
|
Hi there! We haven't cut the release branch for this version yet, so I'm removing the |
Description
Add vendor id to OrtEpFactory.
Motivation and Context
Have EP compatibility as per MSFT rules