Skip to content

Conversation

@gedoensmax
Copy link
Contributor

Description

This leverages the OrtAllocator for intermediate workspace required to execute the TRT engine. With this change we are able to significantly reduce memory usage for models with wide dynamic shape ranges as seen on ORT GenAI.

@jywu-msft @chilo-ms from our side reviews on this are done.

@gaugarg-nv
Copy link
Contributor

@jywu-msft This PR is a blocker to run LLMs with TRT-RTX EP. Can you please merge this?

@gaugarg-nv
Copy link
Contributor

@jywu-msft This PR is a blocker to run LLMs with TRT-RTX EP. Can you please merge this?

@jywu-msft We would also like to have this for WinML GA. Could you please help cherry-pick it in the right branch?

@jywu-msft jywu-msft added the ep:NvRTX NV RTX execution provider label Jul 31, 2025
@jywu-msft jywu-msft requested review from chilo-ms and Copilot August 2, 2025 07:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR leverages the ORT allocator for workspace allocations in the NVIDIA TensorRT RTX execution provider, significantly reducing memory usage for models with wide dynamic shape ranges. The change removes the previous context memory sharing mechanism and replaces it with dynamic allocation using ORT's allocator infrastructure.

Key changes include:

  • Removal of the context_memory_sharing_enable configuration option and related infrastructure
  • Implementation of dynamic context memory allocation using ORT allocator with per-context memory management
  • Addition of utility functions to detect dynamic shapes in TensorRT tensors

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
nv_basic_test.cc Updated test configuration and corrected model filename for AutoEP test
nv_execution_provider_utils.h Added utility functions for detecting dynamic shapes in TensorRT tensors
nv_execution_provider_info.h Removed context_memory_sharing_enable configuration option
nv_execution_provider.h Updated OutputAllocator to use ORT allocator and modified state structures for dynamic memory management
nv_execution_provider.cc Implemented dynamic context memory allocation logic and removed static memory sharing code

class OutputAllocator : public nvinfer1::IOutputAllocator {
public:
OutputAllocator() = delete;
OutputAllocator(OrtAllocator* allocator) : alloc_(allocator) {};
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semicolon after the closing brace is unnecessary for constructor definitions. Remove the semicolon.

Suggested change
OutputAllocator(OrtAllocator* allocator) : alloc_(allocator) {};
OutputAllocator(OrtAllocator* allocator) : alloc_(allocator) {}

Copilot uses AI. Check for mistakes.
if (trt_state->context_memory_size != mem_size) {
LOGS_DEFAULT(INFO) << "[NvTensorRTRTX EP] A new context memory was allocated with size " << mem_size;
trt_state->context_memory = IAllocator::MakeUniquePtrFromOrtAllocator<void>(alloc, mem_size, false /*use_reserve*/);
// trt_state->context_memory = IAllocator::MakeUniquePtr<void>(alloc, mem_size, false /*use_reserve*/, stream);
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out line should be removed as it appears to be leftover debug/alternative implementation code.

Suggested change
// trt_state->context_memory = IAllocator::MakeUniquePtr<void>(alloc, mem_size, false /*use_reserve*/, stream);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep this as a TODO for an improvement coming soon that uses AllocOnStream

@jywu-msft jywu-msft requested a review from HectorSVC August 2, 2025 07:19
@jywu-msft
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@gedoensmax
Copy link
Contributor Author

The windows runners seem to be stuck in setup phase.

@jywu-msft
Copy link
Member

The windows runners seem to be stuck in setup phase.

restarted them.

Copy link
Contributor

@skottmckay skottmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@gedoensmax
Copy link
Contributor Author

@jywu-msft Since this has been somewhat delayed and held us from opening more branches that build on top of these changes we cam up with a cumulative merge branch. #25656
This is duplicated in this branch now, how do you want us to proceed ?

@chilo-ms
Copy link
Contributor

chilo-ms commented Aug 5, 2025

@jywu-msft Since this has been somewhat delayed and held us from opening more branches that build on top of these changes we cam up with a cumulative merge branch. #25656 This is duplicated in this branch now, how do you want us to proceed ?

Could you help update the PR description for that cumulative merge branch? I will review it.

@gedoensmax
Copy link
Contributor Author

@chilo-ms I updated the description and left some more comments

@chilo-ms
Copy link
Contributor

chilo-ms commented Aug 6, 2025

Close this PR since it's duplicated in #25656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:NvRTX NV RTX execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants