Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect memory usage for nvidia driver higher than R510 #141

Closed
huchinlp opened this issue Nov 6, 2022 · 7 comments
Closed

Incorrect memory usage for nvidia driver higher than R510 #141

huchinlp opened this issue Nov 6, 2022 · 7 comments
Labels
Milestone

Comments

@huchinlp
Copy link

huchinlp commented Nov 6, 2022

When I update my nvidia driver to 515.76, gpustat always shows there are 308MB memory used even no process using gpus.
But nvidia-smi shows there are no memory being used, so I am very confused...

Here is my environment:
ubuntu 18.04
gpustat 1.0.0
nvidia-driver 515.76
kernel 4.15.0-171-generic

And here are the outputs of nvidia-smi and gpustat:
nvidia-smi
gpustat

@wookayin
Copy link
Owner

wookayin commented Nov 8, 2022

Before updating the driver was it fine? If so, can you please let me know of the driver version?

This is probably due to the mismatch between pynvml and nvidia driver version. We pinned nvidia-ml-py version at 11.495.46, this is not compatible with very recent drivers. Please see #107 for details.

As a workaround, you can try installing nvidia-ml-py (it will complain about the incompatibility with gpustat v1.0.0) to the latest release, e.g. pip install --ignore-installed nvidia-ml-py>=11.510.69. With this pynvml version, gpustat will not work with old nvidia drivers, but it should be OK in your environments. Can you please let me know how the result would differ?

@wookayin wookayin added the pynvml label Nov 8, 2022
@wookayin wookayin added this to the 1.1 milestone Nov 8, 2022
@wookayin

This comment was marked as resolved.

@jinmingyi1998

This comment was marked as duplicate.

@wookayin
Copy link
Owner

wookayin commented Nov 17, 2022

Can you please try the following (pynvml 11.510.69+ required):

import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
print(pynvml.nvmlDeviceGetMemoryInfo(handle))
print(pynvml.nvmlDeviceGetMemoryInfo(handle, version=2))

It looks like nvmlDeviceGetMemoryInfo_v2 was added in driver 510.39.01. A corresponding pynvml breaking change was added in 11.510.69.

Related to XuehaiPan/nvitop#13

@jinmingyi1998
Copy link

Can you please try the following:

import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
print(pynvml.nvmlDeviceGetMemoryInfo(handle))
print(pynvml.nvmlDeviceGetMemoryInfo(handle, version=2))

It looks like nvmlDeviceGetMemoryInfo_v2 was added in driver 510.39.01 or higher. Related to XuehaiPan/nvitop#13

It seems works.

nvitop patch it in PR XuehaiPan/nvitop#30

>>> print(pynvml.nvmlDeviceGetMemoryInfo(h,version=pynvml.nvmlMemory_v2))
c_nvmlMemory_v2_t(version: 33554472 B, total: 25769803776 B, reserved: 322633728 B, free: 25447038976 B, used: 131072 B)

@wookayin wookayin changed the title gpustat shows 300 more memory used than nvidia-smi Incorrect memory usage for nvidia driver higher than R510 Nov 26, 2022
@wookayin
Copy link
Owner

wookayin commented Nov 26, 2022

I hate NVIDIA breaking existing functions in new versions of drivers. For drivers higher than 510.39.01, perhaps the only way to get the correct memory usage information is to use nvmlDeviceGetMemoryInfo_v2, i.e. use pynvml 11.510.69+. However, older drivers cannot use pynvml>=11.510.69 because it breaks process information (#107).

A consequence is that a different pynvml version must be used depending on the ndivida drivers (this is something difficult to resolve during installation and build time), and if incompatible versions are found then a proper warning messages should be printed. I'm going to relax the pynvml requirement <= 11.495.46 introduced in v1.0 (see #143).

@wookayin
Copy link
Owner

wookayin commented Dec 1, 2022

Fixed via #143 (comment). Released in v1.1.

wookayin added a commit that referenced this issue Dec 1, 2022
@wookayin wookayin pinned this issue Apr 10, 2023
@wookayin wookayin unpinned this issue Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants