fix(api/libnvml): fix process info support for NVIDIA R535 driver (CUDA 12.2+) #79
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue Type
Description
The start with the NVIDIA R510 driver, the new version 3 APIs have been added for
nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses
. But the version 3 functions still use the version 2 type struct as the function argument type:Recently, the NVIDIA R535 driver came out. The version 3 APIs starts to use the new version 3 type struct without a version bump. This results in invalid memory access and produces the wrong results.
The two type structs have different sizes:
This PR adds a helper function that determines the API version and type struct version of
nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses
on the first API call.Motivation and Context
Fixes #75
Fixes #76