Skip to content

Fix for Cuda Graph in pre-compiled path#27477

Merged
chilo-ms merged 1 commit intomicrosoft:mainfrom
umangb-09:pr_27329
Mar 3, 2026
Merged

Fix for Cuda Graph in pre-compiled path#27477
chilo-ms merged 1 commit intomicrosoft:mainfrom
umangb-09:pr_27329

Conversation

@umangb-09
Copy link
Copy Markdown
Contributor

Description

setCudaGraphStrategy(kWHOLE_GRAPH_CAPTURE) was present in the dynamic engine build path (CreateNodeComputeInfoFromGraph) but missing from the precompiled/AOT engine path (CreateNodeComputeInfoFromEPContext). Since TRT RTX defaults the CUDA Graph strategy to kDISABLED, CUDA Graph capture never occurred when loading precompiled engines. Applied the same setCudaGraphStrategy call (guarded by the existing TRT_MAJOR_RTX >= 1.3 version check) to the precompiled path to match the dynamic path behavior

Motivation and Context

Fixes #27329 — users reported that cudaGraphLaunch was not occurring when using precompiled (AOT-built) TensorRT-RTX engines, causing individual kernel launches and unnecessary CPU overhead instead of batched graph execution.

@umangb-09
Copy link
Copy Markdown
Contributor Author

@chilo-ms help review this. Thanks!

@chilo-ms
Copy link
Copy Markdown
Contributor

chilo-ms commented Mar 2, 2026

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@chilo-ms chilo-ms merged commit 06bbcd8 into microsoft:main Mar 3, 2026
87 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TensorRT-RTX EP: Enabling CUDA Graph seems ineffective when running a session from a precompiled engine

2 participants