-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core/libnvml): add compatibility layers for NVML Python bindings #30
Conversation
d513134
to
4f841fd
Compare
Not sure when will the v2 version of In class c_nvmlMemory_t(_PrintableStructure):
_fields_ = [
('total', c_ulonglong),
('free', c_ulonglong),
('used', c_ulonglong),
]
_fmt_ = {'<default>': "%d B"}
class c_nvmlMemory_v2_t(_PrintableStructure):
_fields_ = [
('version', c_uint),
('total', c_ulonglong),
('reserved', c_ulonglong),
('free', c_ulonglong),
('used', c_ulonglong),
]
_fmt_ = {'<default>': "%d B"}
nvmlMemory_v2 = 0x02000028 def nvmlDeviceGetMemoryInfo(handle, version=None):
if not version:
c_memory = c_nvmlMemory_t()
fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo")
else:
c_memory = c_nvmlMemory_v2_t()
c_memory.version = version
fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2")
ret = fn(handle, byref(c_memory))
_nvmlCheckReturn(ret)
return c_memory See also:
I got In [1]: import cupy as cp
In [2]: x = cp.zeros((1,))
In [3]: from nvitop import *
In [4]: d = Device(0)
In [5]: str(libnvml.nvmlDeviceGetMemoryInfo(d.handle))
Out[5]: 'c_nvmlMemory_t(total: 8589934592 B, free: 4304117760 B, used: 4285816832 B)'
In [6]: str(libnvml.nvmlDeviceGetMemoryInfo(d.handle, version=2))
╭──────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ <ipython-input-6-c179ba6931d1>:1 in <cell line: 1> │
│ │
│ /home/PanXuehai/Projects/nvitop/venv/lib/python3.9/site-packages/pynvml.py:2301 in nvmlDeviceGetMemoryInfo │
│ │
│ 2298 │ │ c_memory.version = version │
│ 2299 │ │ fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2") │
│ 2300 │ ret = fn(handle, byref(c_memory)) │
│ ❱ 2301 │ _nvmlCheckReturn(ret) │
│ 2302 │ return c_memory │
│ 2303 │
│ 2304 def nvmlDeviceGetBAR1MemoryInfo(handle): │
│ │
│ /home/PanXuehai/Projects/nvitop/venv/lib/python3.9/site-packages/pynvml.py:795 in _nvmlCheckReturn │
│ │
│ 792 │
│ 793 def _nvmlCheckReturn(ret): │
│ 794 │ if (ret != NVML_SUCCESS): │
│ ❱ 795 │ │ raise NVMLError(ret) │
│ 796 │ return ret │
│ 797 │
│ 798 ## Function access ## │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
NVMLError_Unknown: Unknown Error
In [7]: Device.driver_version()
Out[7]: '516.59' Same error on the Windows host: In [1]: import cupy as cp
In [2]: x = cp.zeros((1,))
In [3]: from nvitop import *
In [4]: d = Device(0)
In [5]: str(libnvml.nvmlDeviceGetMemoryInfo(d.handle))
Out[5]: 'c_nvmlMemory_t(total: 8589934592 B, free: 4412493824 B, used: 4177440768 B)'
In [6]: str(libnvml.nvmlDeviceGetMemoryInfo(d.handle, version=2))
┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ <ipython-input-6-c179ba6931d1>:1 in <cell line: 1> │
│ │
│ C:\Tools\Python3\lib\site-packages\pynvml.py:2301 in nvmlDeviceGetMemoryInfo │
│ │
│ 2298 │ │ c_memory.version = version │
│ 2299 │ │ fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2") │
│ 2300 │ ret = fn(handle, byref(c_memory)) │
│ > 2301 │ _nvmlCheckReturn(ret) │
│ 2302 │ return c_memory │
│ 2303 │
│ 2304 def nvmlDeviceGetBAR1MemoryInfo(handle): │
│ │
│ C:\Tools\Python3\lib\site-packages\pynvml.py:795 in _nvmlCheckReturn │
│ │
│ 792 │
│ 793 def _nvmlCheckReturn(ret): │
│ 794 │ if (ret != NVML_SUCCESS): │
│ > 795 │ │ raise NVMLError(ret) │
│ 796 │ return ret │
│ 797 │
│ 798 ## Function access ## │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
NVMLError_Unknown: Unknown Error |
d9d08d0
to
4be5f8e
Compare
1aa80e7
to
01133f8
Compare
01133f8
to
02918b9
Compare
Waiting for a new driver release for v2 memory info API. |
e18247e
to
26067ca
Compare
02918b9
to
ddfd0b8
Compare
326f49f
to
41f7325
Compare
2155404
to
5ce55ec
Compare
5ce55ec
to
ddcca1c
Compare
e1c2584
to
ea5aa1d
Compare
ea5aa1d
to
06a335f
Compare
06a335f
to
3ae6d86
Compare
3ae6d86
to
8c45011
Compare
Signed-off-by: Xuehai Pan <[email protected]>
Signed-off-by: Xuehai Pan <[email protected]>
Signed-off-by: Xuehai Pan <[email protected]>
Signed-off-by: Xuehai Pan <[email protected]>
8c45011
to
5054849
Compare
I upgrade the NVIDIA driver to |
Signed-off-by: Xuehai Pan <[email protected]>
Update: The correct API call for the pynvml.nvmlDeviceGetMemoryInfo(handle, version=pynvml.nvmlMemory_v2) rather than pynvml.nvmlDeviceGetMemoryInfo(handle, version=2) where pynvml.nvmlMemory_v2 = 33554472 = ctypes.sizeof(pynvml.c_nvmlMemory_v2_t) | 2 << 24 |
Signed-off-by: Xuehai Pan <[email protected]>
77572ee
to
34253e5
Compare
…e for memory info version 2 APIs Signed-off-by: Xuehai Pan <[email protected]>
Signed-off-by: Xuehai Pan <[email protected]>
Issue Type
Runtime Environment
3.9.13
470.129.06
nvitop
version or commit:v0.7.1
python-ml-py
version:11.450.51
en_US.UTF-8
Description
Automatically patch the
pynvml
module when the first call fails when calling the versioned APIs. Now we support a more broad range of the PyPI packagenvidia-ml-py
dependency versions.Motivation and Context
See #29 for more details.
Resolves #29
Closes #13
Testing
Using
nvidia-ml-py == 11.515.48
with the NVIDIA R430 driver (CUDA 10.x):Result:
The v3 API
nvmlDeviceGetComputeRunningProcesses_v3
fails-back to v2 APInvmlDeviceGetComputeRunningProcesses_v2
(which could not found either), then fails-back to v1 APInvmlDeviceGetComputeRunningProcesses
.