Skip to content

Only add support for CUDA devices in GetSupportedDevices#4

Merged
chilo-ms merged 12 commits intomainfrom
chi/update
Apr 15, 2026
Merged

Only add support for CUDA devices in GetSupportedDevices#4
chilo-ms merged 12 commits intomainfrom
chi/update

Conversation

@chilo-ms
Copy link
Copy Markdown
Collaborator

@chilo-ms chilo-ms commented Apr 1, 2026

This PR mainly has several changes:

  1. In GetSupportedDevicesImpl, add hardware device vendor ID check before claiming the support of this hardware device.
  2. CUDA uses contiguous ordinals for CUDA-visible NVIDIA devices. This implementation references plugin CUDA EP to hold a device cache where it uses CUDA device id for OrtMemoryInfo creation.
  3. Add TensorRT builder placeholder for test scenarios

Comment thread src/tensorrt_provider_factory.cc Outdated
Comment thread src/tensorrt_provider_factory.cc Outdated
Comment thread src/tensorrt_provider_factory.cc Outdated
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@yuslepukhin
Copy link
Copy Markdown

// delete static_cast<TRTEpDataTransfer*>(this_ptr);

All instances of DataTransfer now leak.


Refers to: src/tensorrt_execution_provider_data_transfer.cc:114 in 41ea68c. [](commit_id = 41ea68c, deletion_comment = False)

auto& factory = *static_cast<TensorrtExecutionProviderFactory*>(this_ptr);
*data_transfer = factory.data_transfer_impl.get();

auto data_transfer_impl = std::make_unique<TRTEpDataTransfer>(static_cast<const ApiPtrs&>(factory));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks due to ReleaseImpl() now commented out.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment and add back delete static_cast<TRTEpDataTransfer*>(this_ptr);

Comment thread src/tensorrt_provider_factory.cc Outdated
}

if (num_cuda_devices == 0) {
Ort::ThrowOnError(ort_api->Logger_LogMessage(default_logger,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw a C++ exception through a C API boundary which will immediatly terminate the process.
In general, every C API implemented in C++ should guard against exceptions that can rip through the C API into a C program.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to use RETURN_IF_ERROR instead.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should wrap every C API in try/catch macros.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


// Query CUDA device count once upfront so we can validate assigned ordinals.
int cuda_device_count = 0;
cudaError_t cuda_err = cudaGetDeviceCount(&cuda_device_count);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: cudaGetDeviceCount failure is treated as no devices. This is a rare catastrophic failure which must be reported. In CreateEpFactories, cudaGetDeviceCount failure returns ORT_EP_FAIL.
Plugin creation can still fail on systems without usable CUDA runtime, which conflicts with the stated PR intent of graceful enumeration behavior when CUDA devices are unavailable.
The no-device case was improved, but error-path semantics remain inconsistent with that design intent.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code in CreateEpFactories and GetSupportedDevicesImpl, they are consistent now and only log warning message if no CUDA devices available

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two situations 1) no devices 2) cuda API fails. The latter must error out, not just a warning.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed.

Comment thread src/tensorrt_provider_factory.cc Outdated
// CUDA uses contiguous ordinals for CUDA-visible NVIDIA devices. Build that
// mapping from the filtered hardware-device list instead of relying on the
// ORT hardware device id, which is not guaranteed to be a CUDA ordinal.
int current_device_id = cuda_device_index++;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code still assigns CUDA ordinals by filtered enumeration order not from a direct CUDA ordinal provided by hardware-device metadata.

If ORT hardware enumeration order diverges from CUDA-visible ordinal order, allocator/memory-info association can still mismatch.

Partially addressed (bounds check added at tensorrt_provider_factory.cc:169), but the deeper ordering-assumption concern remains.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
The device ordering assumption is a concern.

To address this i use cudaDeviceGetByPCIBusId to get the cuda device ordinal by providing PCI Bus ID.
ORT currently doesn't have PCI Bus ID as device metadata on Windows, and i created a PR for it.
Also, the plugin CUDA EP also has the same ordering assumption issue and i address it as well.

}

// Manual init for the C++ API
Ort::InitApi(ort_api);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ort::InitApi(ort_api) should be on the first things to do. E.g. Ort::ThrowOnError() requires it but it is not initialized.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

return ort_api->CreateStatus(ORT_RUNTIME_EXCEPTION, err_msg.c_str());
}

try {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This C API is still not guarding against exceptions. This try/catch is too narrow.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed.


cuda_pinned_memory_infos[device_id] = MemoryInfoUniquePtr(mem_info, ort_api.ReleaseMemoryInfo);
}
const OrtMemoryInfo* TensorrtExecutionProviderFactory::GetMemoryInfoByOrdinal(int cuda_ordinal, bool is_pinned) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread about 6-space indentation in new blocks
Reviewer concern: inconsistent block indentation vs surrounding function style.
Current state: still mildly inconsistent in helper blocks that use 4-space body indentation where nearby functions mostly use 2-space body indentation.
Examples:
tensorrt_provider_factory.cc:98
tensorrt_provider_factory.cc:99
tensorrt_provider_factory.cc:103
tensorrt_provider_factory.cc:420
tensorrt_provider_factory.cc:421

Do you run lintrunner ?

ReleaseAllocator = ReleaseAllocatorImpl;

CreateDataTransfer = CreateDataTransferImpl;
IsStreamAware = IsStreamAwareImpl;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace-only artifact still present

};
}

OrtStatus* ORT_API_CALL TensorrtExecutionProviderFactory::GetSupportedDevicesImpl(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function still allows exception propagation. please, review all C API entry points.

} catch (const std::exception& ex) {
// Best-effort: ReleaseEpFactory shouldn't normally throw, but guard the C boundary.
(void)ex;
} catch (...) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catches all exceptions but still returns success.
Risk: teardown failure would be hidden from caller and troubleshooting becomes harder.

Copy link
Copy Markdown

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants