[Enhancement] Backward compatible NVML Python bindings #29
Labels
enhancement
New feature or request
pynvml
Something related to the `nvidia-ml-py` package
upstream
Something upstream related
Milestone
Runtime Environment
3.9.13
470.129.06
nvitop
version or commit:v0.7.1
python-ml-py
version:11.450.51
en_US.UTF-8
Context
The official NVML Python bindings (PyPI package
nvidia-ml-py
) do not guarantee backward compatibility for different NVIDIA drivers. For example, NVML addednvmlDeviceGetComputeRunningProcesses_v2
andnvmlDeviceGetGraphicsRunningProcesses_v2
in CUDA 11.x drivers (R450+). But the packagenvidia-ml-py
arbitrary call the latest version of the function in the unversioned function:This will cause
NVMLError_FunctionNotFound
error on CUDA 10.x drivers (e.g. R430).Now there are the
v3
version ofnvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses
functions come with the R510+ drivers. E.g., innvidia-ml-py==11.515.48
:The
v2
version ofc_nvmlMemory_v2_t
is appearing on the horizon (not found in R510 driver yet). This causes issue #13.Possible Solutions
Determine the best dependency version of
nvidia-ml-py
during installation.This requires the user to install the NVIDIA driver first, which may not be fulfilled on a freshly installed system. Besides, it's hard to list this driver dependency in the package metadata.
Wait for the PyPI package
nvidia-ml-py
to become backward compatible.The package
NVIDIA/go-nvml
offers backward compatible APIs:I posted this on the NVIDIA developer forums [PyPI/nvidia-ml-py] Issue Reports for
nvidia-ml-py
but did not get any official response yet.Vender the
nvidia-ml-py
innvitop
. (Note:nvidia-ml-py
is released under the BSD License)This requires bumping the vendered version and making a minor release of
nvitop
each time a new version ofnvidia-ml-py
comes out.Automatically patch the
pynvml
module when the first call fails when calling the versioned APIs. This can achieve by manipulating the__dict__
attribute or themodule.__class__
attribute.The goal of this solution is not to make fully backward-compatible Python bindings. That may be out of the scope of
nvitop
, e.g.ExcludedDeviceInfo -> BlacklistDeviceInfo
. Also, note that this solution may cause performance issues for a much deeper call stack.The text was updated successfully, but these errors were encountered: