Only add support for CUDA devices in GetSupportedDevices#4
Conversation
|
|
| auto& factory = *static_cast<TensorrtExecutionProviderFactory*>(this_ptr); | ||
| *data_transfer = factory.data_transfer_impl.get(); | ||
|
|
||
| auto data_transfer_impl = std::make_unique<TRTEpDataTransfer>(static_cast<const ApiPtrs&>(factory)); |
There was a problem hiding this comment.
This leaks due to ReleaseImpl() now commented out.
There was a problem hiding this comment.
remove the comment and add back delete static_cast<TRTEpDataTransfer*>(this_ptr);
| } | ||
|
|
||
| if (num_cuda_devices == 0) { | ||
| Ort::ThrowOnError(ort_api->Logger_LogMessage(default_logger, |
There was a problem hiding this comment.
This will throw a C++ exception through a C API boundary which will immediatly terminate the process.
In general, every C API implemented in C++ should guard against exceptions that can rip through the C API into a C program.
There was a problem hiding this comment.
I changed to use RETURN_IF_ERROR instead.
There was a problem hiding this comment.
Should wrap every C API in try/catch macros.
|
|
||
| // Query CUDA device count once upfront so we can validate assigned ordinals. | ||
| int cuda_device_count = 0; | ||
| cudaError_t cuda_err = cudaGetDeviceCount(&cuda_device_count); |
There was a problem hiding this comment.
Here and below: cudaGetDeviceCount failure is treated as no devices. This is a rare catastrophic failure which must be reported. In CreateEpFactories, cudaGetDeviceCount failure returns ORT_EP_FAIL.
Plugin creation can still fail on systems without usable CUDA runtime, which conflicts with the stated PR intent of graceful enumeration behavior when CUDA devices are unavailable.
The no-device case was improved, but error-path semantics remain inconsistent with that design intent.
There was a problem hiding this comment.
I updated the code in CreateEpFactories and GetSupportedDevicesImpl, they are consistent now and only log warning message if no CUDA devices available
There was a problem hiding this comment.
There are two situations 1) no devices 2) cuda API fails. The latter must error out, not just a warning.
| // CUDA uses contiguous ordinals for CUDA-visible NVIDIA devices. Build that | ||
| // mapping from the filtered hardware-device list instead of relying on the | ||
| // ORT hardware device id, which is not guaranteed to be a CUDA ordinal. | ||
| int current_device_id = cuda_device_index++; |
There was a problem hiding this comment.
Current code still assigns CUDA ordinals by filtered enumeration order not from a direct CUDA ordinal provided by hardware-device metadata.
If ORT hardware enumeration order diverges from CUDA-visible ordinal order, allocator/memory-info association can still mismatch.
Partially addressed (bounds check added at tensorrt_provider_factory.cc:169), but the deeper ordering-assumption concern remains.
There was a problem hiding this comment.
Good catch!
The device ordering assumption is a concern.
To address this i use cudaDeviceGetByPCIBusId to get the cuda device ordinal by providing PCI Bus ID.
ORT currently doesn't have PCI Bus ID as device metadata on Windows, and i created a PR for it.
Also, the plugin CUDA EP also has the same ordering assumption issue and i address it as well.
| } | ||
|
|
||
| // Manual init for the C++ API | ||
| Ort::InitApi(ort_api); |
There was a problem hiding this comment.
Ort::InitApi(ort_api) should be on the first things to do. E.g. Ort::ThrowOnError() requires it but it is not initialized.
| return ort_api->CreateStatus(ORT_RUNTIME_EXCEPTION, err_msg.c_str()); | ||
| } | ||
|
|
||
| try { |
There was a problem hiding this comment.
This C API is still not guarding against exceptions. This try/catch is too narrow.
|
|
||
| cuda_pinned_memory_infos[device_id] = MemoryInfoUniquePtr(mem_info, ort_api.ReleaseMemoryInfo); | ||
| } | ||
| const OrtMemoryInfo* TensorrtExecutionProviderFactory::GetMemoryInfoByOrdinal(int cuda_ordinal, bool is_pinned) { |
There was a problem hiding this comment.
Thread about 6-space indentation in new blocks
Reviewer concern: inconsistent block indentation vs surrounding function style.
Current state: still mildly inconsistent in helper blocks that use 4-space body indentation where nearby functions mostly use 2-space body indentation.
Examples:
tensorrt_provider_factory.cc:98
tensorrt_provider_factory.cc:99
tensorrt_provider_factory.cc:103
tensorrt_provider_factory.cc:420
tensorrt_provider_factory.cc:421
Do you run lintrunner ?
| ReleaseAllocator = ReleaseAllocatorImpl; | ||
|
|
||
| CreateDataTransfer = CreateDataTransferImpl; | ||
| IsStreamAware = IsStreamAwareImpl; |
There was a problem hiding this comment.
Extra whitespace-only artifact still present
| }; | ||
| } | ||
|
|
||
| OrtStatus* ORT_API_CALL TensorrtExecutionProviderFactory::GetSupportedDevicesImpl( |
There was a problem hiding this comment.
This function still allows exception propagation. please, review all C API entry points.
| } catch (const std::exception& ex) { | ||
| // Best-effort: ReleaseEpFactory shouldn't normally throw, but guard the C boundary. | ||
| (void)ex; | ||
| } catch (...) { |
There was a problem hiding this comment.
Catches all exceptions but still returns success.
Risk: teardown failure would be hidden from caller and troubleshooting becomes harder.
This PR mainly has several changes: