Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU monitoring (NVidia) #170

Closed
alexis26 opened this issue Jan 28, 2013 · 95 comments
Closed

GPU monitoring (NVidia) #170

alexis26 opened this issue Jan 28, 2013 · 95 comments

Comments

@alexis26
Copy link

Bonjour,

Merci pour ce soft que j'utilise et apprécie (presque depuis le début, car je suivais déjà votre blog via rss avant ça...).

Je me demandais si vous pensez qu'il sera (serait ?) possible d'y ajouter les infos GPU.
Vu que de + en +de cartes graphiques ont un proc. dédié, je me disait qu'il serait intéressant d'avoir le même type d'info qu'avec glances... et éventuellement même de voir la décharge de l'un sur l'autre ...

Merci
Alexis

@nicolargo
Copy link
Owner

Intéressant...

Je vais regarder si il est facile d'obtenir les informations système correspondantes.

@jrenner
Copy link
Contributor

jrenner commented Mar 21, 2013

If using the ATI proprietary driver on linux this command can be used:
aticonfig --adapter=0 --od-getclocks | grep -i GPU
I use the open source driver, so I am not sure about that.

@nicolargo
Copy link
Owner

Postponed in the next major release (v2.0).
Waiting a "standard" way to access to the information...

@herr-biber
Copy link
Contributor

For nvidia, one can use

nvidia-smi -q -x

to obtain an xml tree, which can be parsed for memory usage tag. See

nvidia-smi -q -x | grep '<memory_usage' -A4

There is also an nvidia-ml-py python package in pypi.

@nicolargo
Copy link
Owner

nicolargo commented Mar 8, 2014

@nicolargo nicolargo modified the milestones: Next releases, Version 2.0 May 18, 2014
@nicolargo
Copy link
Owner

@nicolargo
Copy link
Owner

Asked as a new feature into the PSUtil lib: giampaolo/psutil#526 => REJECTED (also waiting for a standard way to grab the stats)

@leinardi
Copy link

leinardi commented Nov 9, 2014

Hi, any news about this feature?

@nicolargo
Copy link
Owner

PSutil will not implement this feature because there is no standard way to
do it on both nvidia, AMD and intel. GPU. I will have a second glance to
existing python lib to implement it on my side.

If you have any additional information and tips to do this, I am your man.

@leinardi
Copy link

leinardi commented Nov 9, 2014

I see, unfortunately I do not have any.

PS
Thank you for your awesome tool :-)

@nicolargo
Copy link
Owner

Best way to proceed is to write a dedicated Python lib with support for:

  • Intel
  • Nvidia
  • ATI

Glances will use this API to retreive GPU stats.

@nicolargo nicolargo changed the title Demande de fct (si possible) : GPU GPU monitoring Jan 30, 2015
@Bedotech
Copy link

Any news? (thanks for the awesome tool!)

@kdbanman
Copy link
Contributor

kdbanman commented Sep 7, 2016

This feature would make Glances better for sure. It's already amazing, so this would be icing on an already iced cake 🍰

@nicolargo
Copy link
Owner

Could be possible to do a first version for Nvidia GPU because a Python lib already exist (https://pypi.python.org/pypi/nvidia-ml-py/). For AMD and INTEL, i am looking for existing Python Lib.

@kdbanman
Copy link
Contributor

kdbanman commented Sep 7, 2016

@nicolargo Keen observation - I was just doing that! I actually have a half-implemented version that uses that library. The stats collection works, but I just can't quite find all the places to tell Glances about it.

There's the GlancesPlugin to override, the webapp angular plugin and view, the REST API, and some others. I could just grep -nr "cpu" and make my gpu plugin fill all the same holes, but do you have a document that lays out the plugin addition process?

@nicolargo
Copy link
Owner

@kdbanman Happy to heard that you have started to work on this enhancement.

For the moment there is no official documentation / check-list about plugin addition. You can have a look on this pull request which is a good example: #911

Or, if you want, you can write a first version of the plugin (only the plugin script) and i will create a branch will all the stuff arround it to test it on your system (i do not have any Nvidia card...).

What do you think ?

@kdbanman
Copy link
Contributor

kdbanman commented Sep 8, 2016

@nicolargo That PR is exactly what I needed. I can work from there. 3 questions:

  1. Do you want me to branch off develop like that PR did?
  2. If I have questions about that PR specifically, can I ask on that PR's conversation thread?
  3. I can test on an EC2 GPU instance running a RHEL variant, and on a bare metal Ubuntu 14 box. I can't do OS X and Windows. Will that be sufficient?

@drschwabe
Copy link

When the plugin is ready for testing I would be happy to try it on my GTX970m running Ubuntu 16.

@nicolargo
Copy link
Owner

@kdbanman My answers:

  1. Yes, all the dev (enhancement and bug) should be done on a branch off develop. We are currently in RC for the version 2.7 of Glances. The current issue will be implemented in the next release (2.8).
  2. If you have any question regarding the PR, please use the current thread.
  3. Yes ! And @drschwabe is welcome to test on it GTX970m conf.

I have another question. Where do you think that the plugin should be displayed in the UI ?

Looking forward to sse the first implementation !

@kdbanman
Copy link
Contributor

kdbanman commented Sep 9, 2016

Where do you think that the plugin should be displayed in the UI ?

Good question. The accessible stats people might care about, in rough order of priority:

Top priority:

  • GPU processor utilization % (not sure if there is a distinction between user and system here)
  • VRAM utilization %
  • VRAM total
  • VRAM used
  • VRAM free

Mid priority:

  • VRAM bandwidth
  • process list
  • temperature

I'm really not sure what to show where though:

screenshot-wide_png__848x500_

Thoughts?

@nicolargo
Copy link
Owner

nicolargo commented Sep 9, 2016

@kdbanman : first off all we have to identify the mandatories stats for the GPU (do not forget that Glances aims at providong a "as quick as possible" monitoring of all the system). To my point of view we need to focus on:

  • GPU identification
  • GPU processor utilization in %
  • Memory utilisation in %

Perhaps something like that (width limited to 14 chars and height to 4 lines):

GPU Nvidia 8GB
Processor: 40%
Memory:    75%
IO:       xxxx

The plugin can be between the existing CPU and MEM plugin priot to the extended MEM stats and extended CPU stats (if space is not available).

For the temperature, it should be added to the existing sensors. I can do it on my side after looking at your code.

Another question: how to manage multiple GPU ? My idea:

2 Nvidia 8GB
Processor: 40%
Memory:    75%
IO:       xxxx

What do you think ?

@nicolargo nicolargo modified the milestones: Next releases, Version 2.8 Sep 11, 2016
@kdbanman
Copy link
Contributor

kdbanman commented Sep 12, 2016

The top priority stats you suggest are perfect. A couple questions about them:

GPU Nvidia 2GB   <-- Question #1
Processor: 40%
Memory:    75%
IO:   1.1 GB/s   <-- Question #2
  1. Should we identify the GPU by its size (EX: 2GB), device id, UUID, or model (EX: GTX 560 Ti)

I prefer identifying by size, because id and UUID are not informative or human readable, and model could be a very long string.
2. Memory bandwidth is an important stat, but is not currently supported on virtualized GPU environments, like EC2. In those cases, should we not show that line, or just show N/A for the value?


For multiple GPUs, we could follow the standard for multiple CPUs, where the % utilization is combined. People are used to that for CPU, but it might be confusing for memory. EX: Say we have a 2x GPU setup, where both are at 90% processor usage, and 75% memory usage:

2x GPU Nvidia 2GB
Processor: 180%   <-- Combined, just like the top command
Memory:    150%  <--- Combined.  Confusing?
IO:   2.9 GB/s   <-- Also combined, if available.
  1. What do you think about combining stats like that? The alternative would be to show average stats on each line, or to have a 4 line GPU section for each card.

@denru01
Copy link

denru01 commented Dec 25, 2016

>>> import pynvml
>>> pynvml.nvmlInit()
>>> d = self.get_device_handles()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'self' is not defined

@nicolargo
Copy link
Owner

@denru01 Sorry... here it is:

# python
>>> import pynvml
>>> pynvml.nvmlInit()
>>> d = get_device_handles()
>>> print(pynvml.nvmlDeviceGetUtilizationRates(d[0]).memory)
>>> mi = pynvml.nvmlDeviceGetMemoryInfo(d[0])
>>> print(mi.used)
>>> print(mi.total)
>>> exit()

@denru01
Copy link

denru01 commented Dec 25, 2016

Still get the same error. I feel that get_device_handles() is not a function provided by pynvml, and is implemented in the glances plugin. I guess what you want to get and modify the code, and the result is put below:

>>> import pynvml
>>> pynvml.nvmlInit()
>>> device_handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in range(0, pynvml.nvmlDeviceGetCount())]
>>> print(pynvml.nvmlDeviceGetUtilizationRates(device_handles[0]).memory)
0
>>> for index, device_handle in enumerate(device_handles):
...     print(pynvml.nvmlDeviceGetUtilizationRates(device_handle).memory)
...     mi = pynvml.nvmlDeviceGetMemoryInfo(device_handle)
...     print(mi.used)
...     print(mi.total)
... 
0
2864709632
12781551616
2
4967104512
12781551616
0
2097152
8507555840
0
546177024
11995578368
0
546177024
11995578368
0
4967104512
12781551616
0
2097152
12781551616
0
2097152
12781551616
0
2097152
12781551616

@nicolargo
Copy link
Owner

nicolargo commented Dec 26, 2016

Ok thanks @denru01 . Looks like nvmlDeviceGetUtilizationRates did not work as expected on your system...

The last HEAD version of the DEVELOP branch has been modified to only used the nvmlDeviceGetMemoryInfo method.

Can you try it ?

PS: if you send us one or more screenshots with Glances running the GPU plugin, it will be very kind.

@asergi
Copy link
Collaborator

asergi commented Dec 26, 2016

If you got nvidia-ml-py installed, Glances won't start with python3:

Traceback (most recent call last):
  File "/home/alessio/venvs/py35-glances/bin/glances", line 11, in <module>
    load_entry_point('Glances', 'console_scripts', 'glances')()
  File "/home/alessio/venvs/py35-glances/src/glances/glances/__init__.py", line 225, in main
    start_standalone(config=config, args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/__init__.py", line 105, in start_standalone
    standalone = GlancesStandalone(config=config, args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/standalone.py", line 43, in __init__
    self.stats = GlancesStats(config=config, args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/stats.py", line 43, in __init__
    self.load_modules(self.args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/stats.py", line 77, in load_modules
    self.load_plugins(args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/stats.py", line 96, in load_plugins
    plugin = __import__(os.path.basename(item)[:-3])
  File "/home/alessio/venvs/py35-glances/src/glances/glances/plugins/glances_gpu.py", line 26, in <module>
    import pynvml
  File "/home/alessio/venvs/py35-glances/lib/python3.5/site-packages/pynvml.py", line 1671
    print c_count.value
                ^
SyntaxError: Missing parentheses in call to 'print'

It's almost 2017 and is a shame we imported a new plugin that depends on a python2-only library. -1.

@denru01
Copy link

denru01 commented Dec 27, 2016

Hi,
It works great now. However, it would be better to reduce the number of digits for display (mem).
image

@nicolargo
Copy link
Owner

nicolargo commented Dec 27, 2016

@denru01 Perfect. The digit issue is corrected in the HEAD version of the DEVELOP branch.

Can you post another screenshot with this latest version ?

@denru01
Copy link

denru01 commented Dec 28, 2016

Here you go:P
image

@nicolargo
Copy link
Owner

Just improve the display (GPU name) and add plugin documentation: https://github.com/nicolargo/glances/blob/develop/docs/aoa/gpu.rst

@denru01 : A last test please ?

@nicolargo
Copy link
Owner

@notFloran Can you implement the Web UI ?

Here are some examples:

Mono GPU:

curl http://0.0.0.0:61208/api/2/gpu
[{"mem": null, "gpu_id": 0, "proc": 60, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}]
curl http://0.0.0.0:61208/api/2/gpu/views
{"0": {"mem": {"decoration": "DEFAULT"}, "proc": {"decoration": "CAREFUL"}}}

Multi GPU

curl http://0.0.0.0:61208/api/2/gpu
[{"mem": 48.64645, "gpu_id": 0, "proc": 60.73, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}, {"mem": 70.743, "gpu_id": 1, "proc": 80.28, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}, {"mem": 0, "gpu_id": 2, "proc": 0, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}]
curl http://0.0.0.0:61208/api/2/gpu/views
{"0": {"mem": {"decoration": "OK"}, "proc": {"decoration": "CAREFUL"}}, "1": {"mem": {"decoration": "WARNING"}, "proc": {"decoration": "WARNING"}}, "2": {"mem": {"decoration": "OK"}, "proc": {"decoration": "OK"}}}

Thanks !

@kdbanman
Copy link
Contributor

@asergi Thank you for reporting that python3 crash. The nvidia-ml-py library is supposed to support python 3. It was apparently ported long ago:

Version 2.285.0

  • ...
  • Ported to support Python 3.0 and Python 2.0 syntax.
  • ...

And we are now at 4.304.3, so I'm quite confident you found a bug in that library! I'm unsure where to report it though.

@denru01
Copy link

denru01 commented Dec 30, 2016

@nicolargo Do you mean the HEAD of the develop branch? I did not see any difference in the GPU section, at least no GPU names.
I can try as many times as you want. However, because I am traveling, I cannot reply too fast.
Thanks.

@nicolargo
Copy link
Owner

@denru01 Yes. The last commit of the HEAD version of the DEVELOP branch is 8add69f . Check if you see it using the 'git log' command line.

@notFloran
Copy link
Collaborator

@nicolargo can you give me screenshots of the curse interface for the 2 examples ?

@denru01
Copy link

denru01 commented Jan 2, 2017

@nicolargo yes. The version is correct.

@nicolargo
Copy link
Owner

Here it is:

[{"key": "gpu_id", "mem": None, "proc": 60, "gpu_id": 0, "name": "GeForce GTX 560 Ti"}]

Same interface with and without the 'meangpu' args:

selection_277

[{"key": "gpu_id", "mem": 10, "proc": 60, "gpu_id": 0, "name": "GeForce GTX 560 Ti"}]

Same interface with and without the 'meangpu' args:

selection_278

[{"key": "gpu_id", "mem": 48.64645, "proc": 60.73, "gpu_id": 0, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": 70.743, "proc": 80.28, "gpu_id": 1, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": 0, "proc": 0, "gpu_id": 2, "name": "GeForce GTX 560 Ti"}]

Without meangpu (default behavor):

selection_279

With meangpu:

selection_280

[{"key": "gpu_id", "mem": 48.64645, "proc": 60.73, "gpu_id": 0, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": None, "proc": 80.28, "gpu_id": 1, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": 0, "proc": 0, "gpu_id": 2, "name": "ANOTHER GPU"}]

Without meangpu (default behavor):

selection_281

With meangpu:

selection_282

Thanks @notFloran !

@soichih
Copy link

soichih commented Aug 8, 2019

How can I show GPU temperature on glances? Right now I have to run nvidia-smi separately to see the GPU temp.

@mkanet
Copy link

mkanet commented Jun 9, 2020

I had to install this:
pip install glances[gpu]

@d-damien
Copy link

So this nvidia thing is proprietary and this is why end user packages like gpustat are in Debian Multiverse, correct ? I there no way to talk with Nouveau, the OSS driver ?

@nicolargo
Copy link
Owner

@d-damien i do not think it is related because the nvidia-ml-py lib used by Glances (and also GPUstat) to grab GPU stats is open-sourced (BSD-License).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests