Skip to content

Conversation

@gedoensmax
Copy link
Contributor

Description

TRT supports Bfloat 16 and ORT does as well.
In addition the setup.py was missing a copy for NVTRT EP and TRT EP can only be built against the packaged parser with TRT RTX.

@gedoensmax
Copy link
Contributor Author

I noticed some other things on the path for CUDA device bindings when ORT is compiled without CUDA EP and just with the CUDA EP interface enabled. I will convert this to a draft and finish up tomorrow.

@gedoensmax gedoensmax marked this pull request as draft May 13, 2025 15:09
@gedoensmax
Copy link
Contributor Author

We will resort to relying on CUDA EP for device bindings for the time being.

@gedoensmax
Copy link
Contributor Author

@chilo-ms can you help review this ?

@gedoensmax gedoensmax marked this pull request as ready for review May 15, 2025 11:57
@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chilo-ms
Copy link
Contributor

chilo-ms commented May 16, 2025

Please also help add bf16 in python binding for TRT EP.
similar to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/onnxruntime_pybind_state.cc#L619

@gedoensmax
Copy link
Contributor Author

@chilo-ms do you mind if I just delete the entire thing and replace it with

} else if (provider_name_ == onnxruntime::kTensorrtExecutionProvider) {
#ifdef USE_TENSORRT
const auto& api = Ort::GetApi();
OrtTensorRTProviderOptionsV2* tensorrt_options;
Ort::ThrowOnError(api.CreateTensorRTProviderOptions(&tensorrt_options));
std::unique_ptr<OrtTensorRTProviderOptionsV2, decltype(api.ReleaseTensorRTProviderOptions)> rel_trt_options(
tensorrt_options, api.ReleaseTensorRTProviderOptions);
std::vector<const char*> option_keys, option_values;
// used to keep all option keys and value strings alive
std::list<std::string> buffer;
#ifdef _MSC_VER
std::string ov_string = ToUTF8String(performance_test_config.run_config.ep_runtime_config_string);
#else
std::string ov_string = performance_test_config.run_config.ep_runtime_config_string;
#endif
ParseSessionConfigs(ov_string, provider_options);
for (const auto& provider_option : provider_options) {
option_keys.push_back(provider_option.first.c_str());
option_values.push_back(provider_option.second.c_str());
}
Ort::Status status(api.UpdateTensorRTProviderOptions(tensorrt_options,
option_keys.data(), option_values.data(), option_keys.size()));
if (!status.IsOK()) {
OrtAllocator* allocator;
char* options;
Ort::ThrowOnError(api.GetAllocatorWithDefaultOptions(&allocator));
Ort::ThrowOnError(api.GetTensorRTProviderOptionsAsString(tensorrt_options, allocator, &options));
ORT_THROW("[ERROR] [TensorRT] Configuring the CUDA options failed with message: ", status.GetErrorMessage(),
"\nSupported options are:\n", options);
}
session_options.AppendExecutionProvider_TensorRT_V2(*tensorrt_options);
?
Is the type checking and explicit warning really necessary at the pybind level ?

@chilo-ms
Copy link
Contributor

chilo-ms commented May 20, 2025

@chilo-ms do you mind if I just delete the entire thing and replace it with

} else if (provider_name_ == onnxruntime::kTensorrtExecutionProvider) {
#ifdef USE_TENSORRT
const auto& api = Ort::GetApi();
OrtTensorRTProviderOptionsV2* tensorrt_options;
Ort::ThrowOnError(api.CreateTensorRTProviderOptions(&tensorrt_options));
std::unique_ptr<OrtTensorRTProviderOptionsV2, decltype(api.ReleaseTensorRTProviderOptions)> rel_trt_options(
tensorrt_options, api.ReleaseTensorRTProviderOptions);
std::vector<const char*> option_keys, option_values;
// used to keep all option keys and value strings alive
std::list<std::string> buffer;
#ifdef _MSC_VER
std::string ov_string = ToUTF8String(performance_test_config.run_config.ep_runtime_config_string);
#else
std::string ov_string = performance_test_config.run_config.ep_runtime_config_string;
#endif
ParseSessionConfigs(ov_string, provider_options);
for (const auto& provider_option : provider_options) {
option_keys.push_back(provider_option.first.c_str());
option_values.push_back(provider_option.second.c_str());
}
Ort::Status status(api.UpdateTensorRTProviderOptions(tensorrt_options,
option_keys.data(), option_values.data(), option_keys.size()));
if (!status.IsOK()) {
OrtAllocator* allocator;
char* options;
Ort::ThrowOnError(api.GetAllocatorWithDefaultOptions(&allocator));
Ort::ThrowOnError(api.GetTensorRTProviderOptionsAsString(tensorrt_options, allocator, &options));
ORT_THROW("[ERROR] [TensorRT] Configuring the CUDA options failed with message: ", status.GetErrorMessage(),
"\nSupported options are:\n", options);
}
session_options.AppendExecutionProvider_TensorRT_V2(*tensorrt_options);

?
Is the type checking and explicit warning really necessary at the pybind level ?

Agree that type checking and explicit warning are not necessary.
Your suggestion needs to add the ORT C API dependencies that is not added in this pybind file before and needs some efforts to test.
Another option is to refactor the code with what CUDA EP did, but that requires TRT EP to expose some functions like TensorRTExecutionProviderInfo__FromProviderOptions.

Either way, i think we can do this in another PR?

@gedoensmax
Copy link
Contributor Author

@chilo-ms changes are done. Let me know if there is something else for this API.

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chilo-ms chilo-ms merged commit 3a20910 into microsoft:main May 23, 2025
82 checks passed
@snnn
Copy link
Contributor

snnn commented May 31, 2025

We got the following error when running the tests on T4 machines:
[ FAILED ] CApiTensorRTTest/CApiTensorRTTest.TestConfigureTensorRTProviderOptions/5, where GetParam() = "trt_bf16_enable=1"

C++ exception with description "TensorRT EP failed to create engine from network for fused node: TensorrtExecutionProvider_TRTKernel_graph_mul test_9904508914055400613_0_0" thrown in the test body.

@chilo-ms
Copy link
Contributor

We got the following error when running the tests on T4 machines: [ FAILED ] CApiTensorRTTest/CApiTensorRTTest.TestConfigureTensorRTProviderOptions/5, where GetParam() = "trt_bf16_enable=1"

C++ exception with description "TensorRT EP failed to create engine from network for fused node: TensorrtExecutionProvider_TRTKernel_graph_mul test_9904508914055400613_0_0" thrown in the test body.

Addressed in this PR.
#24915

Comment on lines +37 to +40
} else if constexpr (std::is_same<T, BFloat16>::value) {
dtype_name = "fp16";
} else if constexpr (std::is_same<T, MLFloat16>::value) {
dtype_name = "bf16";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this missed in the original review, it seems we are setting the BFloat16 type to fp16 rather than bf16 and vice-versa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it seems so. If you want to use bfloat16 I would recommend using a strongly typed ONNX rather than using this global flag.

quic-ankus pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Nov 25, 2025
### Description

TRT supports Bfloat 16 and ORT does as well. 
In addition the `setup.py` was missing a copy for NVTRT EP and TRT EP
can only be built against the packaged parser with TRT RTX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants