[NvTensorRT RTX] Add Bfloat16 #24743

gedoensmax · 2025-05-13T11:50:20Z

Description

TRT supports Bfloat 16 and ORT does as well.
In addition the setup.py was missing a copy for NVTRT EP and TRT EP can only be built against the packaged parser with TRT RTX.

gedoensmax · 2025-05-13T15:09:34Z

I noticed some other things on the path for CUDA device bindings when ORT is compiled without CUDA EP and just with the CUDA EP interface enabled. I will convert this to a draft and finish up tomorrow.

gedoensmax · 2025-05-15T11:56:26Z

We will resort to relying on CUDA EP for device bindings for the time being.

gedoensmax · 2025-05-15T11:56:47Z

@chilo-ms can you help review this ?

chilo-ms · 2025-05-15T19:09:59Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-05-15T19:10:22Z

Azure Pipelines successfully started running 5 pipeline(s).

chilo-ms · 2025-05-16T15:56:47Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-05-16T15:57:09Z

Azure Pipelines successfully started running 5 pipeline(s).

chilo-ms · 2025-05-16T17:41:31Z

Please also help add bf16 in python binding for TRT EP.
similar to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/onnxruntime_pybind_state.cc#L619

gedoensmax · 2025-05-17T21:34:53Z

@chilo-ms do you mind if I just delete the entire thing and replace it with

onnxruntime/onnxruntime/test/perftest/ort_test_session.cc

Lines 151 to 183 in 2a09f27

    
             } else if (provider_name_ == onnxruntime::kTensorrtExecutionProvider) { 
        
           #ifdef USE_TENSORRT 
        
               const auto& api = Ort::GetApi(); 
        
               OrtTensorRTProviderOptionsV2* tensorrt_options; 
        
               Ort::ThrowOnError(api.CreateTensorRTProviderOptions(&tensorrt_options)); 
        
               std::unique_ptr<OrtTensorRTProviderOptionsV2, decltype(api.ReleaseTensorRTProviderOptions)> rel_trt_options( 
        
                   tensorrt_options, api.ReleaseTensorRTProviderOptions); 
        
               std::vector<const char*> option_keys, option_values; 
        
               // used to keep all option keys and value strings alive 
        
               std::list<std::string> buffer; 
        
           #ifdef _MSC_VER 
        
               std::string ov_string = ToUTF8String(performance_test_config.run_config.ep_runtime_config_string); 
        
           #else 
        
               std::string ov_string = performance_test_config.run_config.ep_runtime_config_string; 
        
           #endif 
        
               ParseSessionConfigs(ov_string, provider_options); 
        
               for (const auto& provider_option : provider_options) { 
        
                 option_keys.push_back(provider_option.first.c_str()); 
        
                 option_values.push_back(provider_option.second.c_str()); 
        
               } 
        
               Ort::Status status(api.UpdateTensorRTProviderOptions(tensorrt_options, 
        
                                                                    option_keys.data(), option_values.data(), option_keys.size())); 
        
               if (!status.IsOK()) { 
        
                 OrtAllocator* allocator; 
        
                 char* options; 
        
                 Ort::ThrowOnError(api.GetAllocatorWithDefaultOptions(&allocator)); 
        
                 Ort::ThrowOnError(api.GetTensorRTProviderOptionsAsString(tensorrt_options, allocator, &options)); 
        
                 ORT_THROW("[ERROR] [TensorRT] Configuring the CUDA options failed with message: ", status.GetErrorMessage(), 
        
                           "\nSupported options are:\n", options); 
        
               } 
        
               session_options.AppendExecutionProvider_TensorRT_V2(*tensorrt_options);

?
Is the type checking and explicit warning really necessary at the pybind level ?

onnxruntime/test/providers/nv_tensorrt_rtx/nv_basic_test.cc

chilo-ms · 2025-05-20T15:59:19Z

@chilo-ms do you mind if I just delete the entire thing and replace it with

onnxruntime/onnxruntime/test/perftest/ort_test_session.cc

Lines 151 to 183 in 2a09f27

} else if (provider_name_ == onnxruntime::kTensorrtExecutionProvider) {

#ifdef USE_TENSORRT

const auto& api = Ort::GetApi();

OrtTensorRTProviderOptionsV2* tensorrt_options;

Ort::ThrowOnError(api.CreateTensorRTProviderOptions(&tensorrt_options));

std::unique_ptr<OrtTensorRTProviderOptionsV2, decltype(api.ReleaseTensorRTProviderOptions)> rel_trt_options(

tensorrt_options, api.ReleaseTensorRTProviderOptions);

std::vector<const char*> option_keys, option_values;

// used to keep all option keys and value strings alive

std::list<std::string> buffer;

#ifdef _MSC_VER

std::string ov_string = ToUTF8String(performance_test_config.run_config.ep_runtime_config_string);

#else

std::string ov_string = performance_test_config.run_config.ep_runtime_config_string;

#endif

ParseSessionConfigs(ov_string, provider_options);

for (const auto& provider_option : provider_options) {

option_keys.push_back(provider_option.first.c_str());

option_values.push_back(provider_option.second.c_str());

}

Ort::Status status(api.UpdateTensorRTProviderOptions(tensorrt_options,

option_keys.data(), option_values.data(), option_keys.size()));

if (!status.IsOK()) {

OrtAllocator* allocator;

char* options;

Ort::ThrowOnError(api.GetAllocatorWithDefaultOptions(&allocator));

Ort::ThrowOnError(api.GetTensorRTProviderOptionsAsString(tensorrt_options, allocator, &options));

ORT_THROW("[ERROR] [TensorRT] Configuring the CUDA options failed with message: ", status.GetErrorMessage(),

"\nSupported options are:\n", options);

}

session_options.AppendExecutionProvider_TensorRT_V2(*tensorrt_options);

?
Is the type checking and explicit warning really necessary at the pybind level ?

Agree that type checking and explicit warning are not necessary.
Your suggestion needs to add the ORT C API dependencies that is not added in this pybind file before and needs some efforts to test.
Another option is to refactor the code with what CUDA EP did, but that requires TRT EP to expose some functions like TensorRTExecutionProviderInfo__FromProviderOptions.

Either way, i think we can do this in another PR?

gedoensmax · 2025-05-22T13:42:40Z

@chilo-ms changes are done. Let me know if there is something else for this API.

chilo-ms · 2025-05-22T16:07:39Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-05-22T16:08:00Z

Azure Pipelines successfully started running 5 pipeline(s).

snnn · 2025-05-31T01:57:02Z

We got the following error when running the tests on T4 machines:
[ FAILED ] CApiTensorRTTest/CApiTensorRTTest.TestConfigureTensorRTProviderOptions/5, where GetParam() = "trt_bf16_enable=1"

C++ exception with description "TensorRT EP failed to create engine from network for fused node: TensorrtExecutionProvider_TRTKernel_graph_mul test_9904508914055400613_0_0" thrown in the test body.

chilo-ms · 2025-05-31T05:19:05Z

We got the following error when running the tests on T4 machines: [ FAILED ] CApiTensorRTTest/CApiTensorRTTest.TestConfigureTensorRTProviderOptions/5, where GetParam() = "trt_bf16_enable=1"

C++ exception with description "TensorRT EP failed to create engine from network for fused node: TensorrtExecutionProvider_TRTKernel_graph_mul test_9904508914055400613_0_0" thrown in the test body.

Addressed in this PR.
#24915

joeyearsley · 2025-06-07T11:23:36Z

onnxruntime/test/providers/nv_tensorrt_rtx/nv_basic_test.cc

+    } else if constexpr (std::is_same<T, BFloat16>::value) {
+      dtype_name = "fp16";
+    } else if constexpr (std::is_same<T, MLFloat16>::value) {
+      dtype_name = "bf16";


Was this missed in the original review, it seems we are setting the BFloat16 type to fp16 rather than bf16 and vice-versa?

Yes it seems so. If you want to use bfloat16 I would recommend using a strongly typed ONNX rather than using this global flag.

### Description TRT supports Bfloat 16 and ORT does as well. In addition the `setup.py` was missing a copy for NVTRT EP and TRT EP can only be built against the packaged parser with TRT RTX.

gedoensmax marked this pull request as draft May 13, 2025 15:09

gedoensmax marked this pull request as ready for review May 15, 2025 11:57

chilo-ms mentioned this pull request May 15, 2025

[Feature Request] Support 'trt_bf16_enable' in TensorRT EP on GPU #24765

Open

yuslepukhin reviewed May 19, 2025

View reviewed changes

onnxruntime/test/providers/nv_tensorrt_rtx/nv_basic_test.cc Show resolved Hide resolved

gedoensmax added 6 commits May 22, 2025 15:44

random improvements in lacking dtypes etc

7b1bacb

TRT RTX requires shipped parser

5f1f2da

Bfloat16 for TRT EP

2f2d26c

add license header

11d8273

expose global bf16 precision

ce871aa

BF16 capi testing and IO tensor testing for NV EP

0d1b1a5

gedoensmax force-pushed the bf16 branch from 3872ad8 to 0d1b1a5 Compare May 22, 2025 14:17

chilo-ms approved these changes May 22, 2025

View reviewed changes

chilo-ms merged commit 3a20910 into microsoft:main May 23, 2025
82 checks passed

joeyearsley reviewed Jun 7, 2025

View reviewed changes

This was referenced Jan 9, 2026

Add docs for BF16 in TensorRT provider #26956

Open

BF16 support for integrated TensorRT precision mode triton-inference-server/server#5959

Open

feat: Expose BF16 precision in TensorRT triton-inference-server/onnxruntime_backend#328

Merged

[NvTensorRT RTX] Add Bfloat16 #24743

[NvTensorRT RTX] Add Bfloat16 #24743

Uh oh!

Conversation

gedoensmax commented May 13, 2025

Description

Uh oh!

gedoensmax commented May 13, 2025

Uh oh!

gedoensmax commented May 15, 2025

Uh oh!

gedoensmax commented May 15, 2025

Uh oh!

chilo-ms commented May 15, 2025

Uh oh!

azure-pipelines bot commented May 15, 2025

Uh oh!

chilo-ms commented May 16, 2025

Uh oh!

azure-pipelines bot commented May 16, 2025

Uh oh!

chilo-ms commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gedoensmax commented May 17, 2025

Uh oh!

Uh oh!

chilo-ms commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gedoensmax commented May 22, 2025

Uh oh!

chilo-ms commented May 22, 2025

Uh oh!

azure-pipelines bot commented May 22, 2025

Uh oh!

Uh oh!

snnn commented May 31, 2025

Uh oh!

chilo-ms commented May 31, 2025

Uh oh!

joeyearsley Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

gedoensmax Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chilo-ms commented May 16, 2025 •

edited

Loading

chilo-ms commented May 20, 2025 •

edited

Loading