-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[NV TRT RTX EP] Leverage ORT allocator for workspace allocations #25564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jywu-msft This PR is a blocker to run LLMs with TRT-RTX EP. Can you please merge this? |
@jywu-msft We would also like to have this for WinML GA. Could you please help cherry-pick it in the right branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR leverages the ORT allocator for workspace allocations in the NVIDIA TensorRT RTX execution provider, significantly reducing memory usage for models with wide dynamic shape ranges. The change removes the previous context memory sharing mechanism and replaces it with dynamic allocation using ORT's allocator infrastructure.
Key changes include:
- Removal of the
context_memory_sharing_enableconfiguration option and related infrastructure - Implementation of dynamic context memory allocation using ORT allocator with per-context memory management
- Addition of utility functions to detect dynamic shapes in TensorRT tensors
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| nv_basic_test.cc | Updated test configuration and corrected model filename for AutoEP test |
| nv_execution_provider_utils.h | Added utility functions for detecting dynamic shapes in TensorRT tensors |
| nv_execution_provider_info.h | Removed context_memory_sharing_enable configuration option |
| nv_execution_provider.h | Updated OutputAllocator to use ORT allocator and modified state structures for dynamic memory management |
| nv_execution_provider.cc | Implemented dynamic context memory allocation logic and removed static memory sharing code |
| class OutputAllocator : public nvinfer1::IOutputAllocator { | ||
| public: | ||
| OutputAllocator() = delete; | ||
| OutputAllocator(OrtAllocator* allocator) : alloc_(allocator) {}; |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semicolon after the closing brace is unnecessary for constructor definitions. Remove the semicolon.
| OutputAllocator(OrtAllocator* allocator) : alloc_(allocator) {}; | |
| OutputAllocator(OrtAllocator* allocator) : alloc_(allocator) {} |
| if (trt_state->context_memory_size != mem_size) { | ||
| LOGS_DEFAULT(INFO) << "[NvTensorRTRTX EP] A new context memory was allocated with size " << mem_size; | ||
| trt_state->context_memory = IAllocator::MakeUniquePtrFromOrtAllocator<void>(alloc, mem_size, false /*use_reserve*/); | ||
| // trt_state->context_memory = IAllocator::MakeUniquePtr<void>(alloc, mem_size, false /*use_reserve*/, stream); |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commented-out line should be removed as it appears to be leftover debug/alternative implementation code.
| // trt_state->context_memory = IAllocator::MakeUniquePtr<void>(alloc, mem_size, false /*use_reserve*/, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep this as a TODO for an improvement coming soon that uses AllocOnStream
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
The windows runners seem to be stuck in setup phase. |
restarted them. |
skottmckay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
|
@jywu-msft Since this has been somewhat delayed and held us from opening more branches that build on top of these changes we cam up with a cumulative merge branch. #25656 |
Could you help update the PR description for that cumulative merge branch? I will review it. |
|
@chilo-ms I updated the description and left some more comments |
|
Close this PR since it's duplicated in #25656 |
Description
This leverages the OrtAllocator for intermediate workspace required to execute the TRT engine. With this change we are able to significantly reduce memory usage for models with wide dynamic shape ranges as seen on ORT GenAI.
@jywu-msft @chilo-ms from our side reviews on this are done.