Add PCI bus ID as device metadata on Windows#28066
Conversation
tianleiwu
left a comment
There was a problem hiding this comment.
Summary
Well-motivated fix that addresses a real correctness issue with CUDA device ordinal mapping when OS and CUDA enumeration orders diverge. The PCI bus ID approach is the right solution.
One high-priority concern about the fallback path producing duplicate ordinals when CUDA_VISIBLE_DEVICES restricts GPU visibility.
Positives
- PCI bus ID resolution via
cudaDeviceGetByPCIBusIdis the correct, stable way to map physical GPUs to CUDA ordinals - Good defensive checks on metadata null, empty string, CUDA API failure, and ordinal range validation
- Clean parsing of
SPDRP_LOCATION_INFORMATIONwith correct decimal-to-hex conversion - Excellent test coverage, especially the
PciBusIdRoundTripformat consistency test
Concerns
| # | Severity | Component | Issue |
|---|---|---|---|
| 1 | High | cuda_ep_factory.cc | Fallback ordinal collision when PCI bus ID present but CUDA resolution fails |
| 2 | Suggestion | device_discovery.cc | PCI domain hardcoded to 0000 |
| 3 | Nitpick | test file | Missing trailing newline |
See inline comments for details.
| // Try to resolve the CUDA ordinal from pci_bus_id metadata if available. | ||
| // This is more reliable than counter-based ordinal assignment because it is | ||
| // not affected by enumeration order, CUDA_VISIBLE_DEVICES remapping, or | ||
| // mixed-vendor GPU configurations. | ||
| int current_device_id = -1; | ||
| const OrtKeyValuePairs* metadata = factory->ort_api_.HardwareDevice_Metadata(&device); | ||
| if (metadata != nullptr) { | ||
| const char* pci_bus_id = factory->ort_api_.GetKeyValue(metadata, "pci_bus_id"); | ||
| if (pci_bus_id != nullptr && pci_bus_id[0] != '\0') { | ||
| int resolved_ordinal = -1; | ||
| cudaError_t err = cudaDeviceGetByPCIBusId(&resolved_ordinal, pci_bus_id); | ||
| if (err == cudaSuccess && resolved_ordinal >= 0 && resolved_ordinal < cuda_device_count) { | ||
| current_device_id = resolved_ordinal; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Fallback: if pci_bus_id was not available, use counter-based ordinal assignment. | ||
| if (current_device_id < 0) { | ||
| current_device_id = cuda_device_index++; | ||
| } | ||
|
|
||
| // Validate the assigned ordinal is within the range of CUDA-visible devices. |
There was a problem hiding this comment.
High — Fallback ordinal collision with CUDA_VISIBLE_DEVICES
When pci_bus_id is present but cudaDeviceGetByPCIBusId fails (e.g., device hidden by CUDA_VISIBLE_DEVICES), the code falls back to cuda_device_index++. This counter-based ordinal can collide with a PCI-resolved ordinal from another visible device.
Scenario (CUDA_VISIBLE_DEVICES=1, two NVIDIA GPUs):
cuda_device_count= 1 (one visible device, remapped to ordinal 0)- GPU0 (PCI 01:00.0):
cudaDeviceGetByPCIBusId→cudaErrorInvalidDevice→ fallback → ordinal 0 - GPU1 (PCI 65:00.0):
cudaDeviceGetByPCIBusId→ ordinal 0 (the visible device) - Result: Both GPUs assigned ordinal 0
Suggested fix: When pci_bus_id IS present but CUDA resolution fails, skip the device (it's not CUDA-visible) rather than falling back to the counter:
if (pci_bus_id != nullptr && pci_bus_id[0] != '\0') {
int resolved_ordinal = -1;
cudaError_t err = cudaDeviceGetByPCIBusId(&resolved_ordinal, pci_bus_id);
if (err == cudaSuccess && resolved_ordinal >= 0 && resolved_ordinal < cuda_device_count) {
current_device_id = resolved_ordinal;
} else {
// Device has a PCI bus ID but is not visible to CUDA
// (e.g., filtered by CUDA_VISIBLE_DEVICES). Skip it.
continue;
}
}| } | ||
| } | ||
|
|
||
| // Generate telemetry event to log the GPU and NPU driver name and version. |
There was a problem hiding this comment.
Suggestion — PCI domain hardcoded to 0000
The domain is hardcoded to 0000 in the format string. While this is correct for the vast majority of systems, multi-segment PCI configurations (non-zero domain) exist in some enterprise server hardware. The Linux implementation extracts the actual domain from sysfs.
Consider adding a brief comment noting this limitation (e.g., // Domain is 0000 for standard single-segment PCI; multi-segment configs are not supported), or exploring DEVPKEY_Device_LocationPaths which may contain the full domain.
| } // namespace test | ||
| } // namespace onnxruntime | ||
|
|
||
| #endif // defined(ORT_UNIT_TEST_HAS_CUDA_PLUGIN_EP) No newline at end of file |
There was a problem hiding this comment.
Nitpick: Missing trailing newline at end of file.
Description
This PR mainly has two changes:
cudaDeviceGetByPCIBusIdwith pci bus id provided to get the cuda device ordinal as the device id (it will be later used for OrtMemoryInfo creation). Even though ORT hardware enumeration order diverges from CUDA-visible ordinal order, plugin CUDA EP makes sure the allocator/memory-info matches the device.Motivation and Context
The current code in plugin CUDA EP
int current_device_id = cuda_device_index++;assumes that the Nth NVIDIA GPU in ORT's hardware enumeration order corresponds to CUDA ordinal N. This is fragile because: