GPU monitoring (NVidia) #170

alexis26 · 2013-01-28T10:05:34Z

Bonjour,

Merci pour ce soft que j'utilise et apprécie (presque depuis le début, car je suivais déjà votre blog via rss avant ça...).

Je me demandais si vous pensez qu'il sera (serait ?) possible d'y ajouter les infos GPU.
Vu que de + en +de cartes graphiques ont un proc. dédié, je me disait qu'il serait intéressant d'avoir le même type d'info qu'avec glances... et éventuellement même de voir la décharge de l'un sur l'autre ...

Merci
Alexis

nicolargo · 2013-01-28T10:39:41Z

Intéressant...

Je vais regarder si il est facile d'obtenir les informations système correspondantes.

nicolargo · 2013-01-29T21:09:06Z

Quelques pistes...

Pour nVidia:

A compléter...

jrenner · 2013-03-21T03:04:48Z

If using the ATI proprietary driver on linux this command can be used:
aticonfig --adapter=0 --od-getclocks | grep -i GPU
I use the open source driver, so I am not sure about that.

nicolargo · 2013-04-14T19:50:52Z

Postponed in the next major release (v2.0).
Waiting a "standard" way to access to the information...

herr-biber · 2014-01-08T22:58:36Z

For nvidia, one can use

nvidia-smi -q -x

to obtain an xml tree, which can be parsed for memory usage tag. See

nvidia-smi -q -x | grep '<memory_usage' -A4

There is also an nvidia-ml-py python package in pypi.

nicolargo · 2014-03-08T10:06:50Z

For Nvidia: https://pypi.python.org/pypi/nvidia-ml-py (see this code: https://github.com/ganglia/gmond_python_modules/blob/master/gpu/nvidia/nvidia-ml-py-3.295.00/nvidia_smi.py)

For AMD, no lib found, but can call the aticonfig command line: see: https://code.google.com/p/pyatimonitor/source/browse/contents/code/main.py

nicolargo · 2014-08-02T15:57:24Z

For INTEL GPU: http://anonscm.debian.org/cgit/pkg-xorg/app/intel-gpu-tools.git/tree/tools/intel_gpu_top.c

nicolargo · 2014-08-02T16:06:23Z

Asked as a new feature into the PSUtil lib: giampaolo/psutil#526 => REJECTED (also waiting for a standard way to grab the stats)

leinardi · 2014-11-09T11:04:13Z

Hi, any news about this feature?

nicolargo · 2014-11-09T12:12:43Z

PSutil will not implement this feature because there is no standard way to
do it on both nvidia, AMD and intel. GPU. I will have a second glance to
existing python lib to implement it on my side.

If you have any additional information and tips to do this, I am your man.

leinardi · 2014-11-09T12:16:58Z

I see, unfortunately I do not have any.

PS
Thank you for your awesome tool :-)

nicolargo · 2015-01-30T19:31:40Z

Best way to proceed is to write a dedicated Python lib with support for:

Intel
Nvidia
ATI

Glances will use this API to retreive GPU stats.

Bedotech · 2016-03-24T11:44:43Z

Any news? (thanks for the awesome tool!)

kdbanman · 2016-09-07T20:35:09Z

This feature would make Glances better for sure. It's already amazing, so this would be icing on an already iced cake 🍰

nicolargo · 2016-09-07T20:56:18Z

Could be possible to do a first version for Nvidia GPU because a Python lib already exist (https://pypi.python.org/pypi/nvidia-ml-py/). For AMD and INTEL, i am looking for existing Python Lib.

kdbanman · 2016-09-07T23:15:39Z

@nicolargo Keen observation - I was just doing that! I actually have a half-implemented version that uses that library. The stats collection works, but I just can't quite find all the places to tell Glances about it.

There's the GlancesPlugin to override, the webapp angular plugin and view, the REST API, and some others. I could just grep -nr "cpu" and make my gpu plugin fill all the same holes, but do you have a document that lays out the plugin addition process?

nicolargo · 2016-09-08T07:43:08Z

@kdbanman Happy to heard that you have started to work on this enhancement.

For the moment there is no official documentation / check-list about plugin addition. You can have a look on this pull request which is a good example: #911

Or, if you want, you can write a first version of the plugin (only the plugin script) and i will create a branch will all the stuff arround it to test it on your system (i do not have any Nvidia card...).

What do you think ?

kdbanman · 2016-09-08T17:20:04Z

@nicolargo That PR is exactly what I needed. I can work from there. 3 questions:

Do you want me to branch off develop like that PR did?
If I have questions about that PR specifically, can I ask on that PR's conversation thread?
I can test on an EC2 GPU instance running a RHEL variant, and on a bare metal Ubuntu 14 box. I can't do OS X and Windows. Will that be sufficient?

drschwabe · 2016-09-08T18:44:31Z

When the plugin is ready for testing I would be happy to try it on my GTX970m running Ubuntu 16.

nicolargo · 2016-09-08T19:55:33Z

@kdbanman My answers:

Yes, all the dev (enhancement and bug) should be done on a branch off develop. We are currently in RC for the version 2.7 of Glances. The current issue will be implemented in the next release (2.8).
If you have any question regarding the PR, please use the current thread.
Yes ! And @drschwabe is welcome to test on it GTX970m conf.

I have another question. Where do you think that the plugin should be displayed in the UI ?

Looking forward to sse the first implementation !

kdbanman · 2016-09-09T00:02:21Z

Where do you think that the plugin should be displayed in the UI ?

Good question. The accessible stats people might care about, in rough order of priority:

Top priority:

GPU processor utilization % (not sure if there is a distinction between user and system here)
VRAM utilization %
VRAM total
VRAM used
VRAM free

Mid priority:

VRAM bandwidth
process list
temperature

I'm really not sure what to show where though:

Thoughts?

nicolargo · 2016-09-09T08:50:56Z

@kdbanman : first off all we have to identify the mandatories stats for the GPU (do not forget that Glances aims at providong a "as quick as possible" monitoring of all the system). To my point of view we need to focus on:

GPU identification
GPU processor utilization in %
Memory utilisation in %

Perhaps something like that (width limited to 14 chars and height to 4 lines):

GPU Nvidia 8GB
Processor: 40%
Memory:    75%
IO:       xxxx

The plugin can be between the existing CPU and MEM plugin priot to the extended MEM stats and extended CPU stats (if space is not available).

For the temperature, it should be added to the existing sensors. I can do it on my side after looking at your code.

Another question: how to manage multiple GPU ? My idea:

2 Nvidia 8GB
Processor: 40%
Memory:    75%
IO:       xxxx

What do you think ?

kdbanman · 2016-09-12T19:56:31Z

The top priority stats you suggest are perfect. A couple questions about them:

GPU Nvidia 2GB   <-- Question #1
Processor: 40%
Memory:    75%
IO:   1.1 GB/s   <-- Question #2

Should we identify the GPU by its size (EX: 2GB), device id, UUID, or model (EX: GTX 560 Ti)

I prefer identifying by size, because id and UUID are not informative or human readable, and model could be a very long string.
2. Memory bandwidth is an important stat, but is not currently supported on virtualized GPU environments, like EC2. In those cases, should we not show that line, or just show N/A for the value?

For multiple GPUs, we could follow the standard for multiple CPUs, where the % utilization is combined. People are used to that for CPU, but it might be confusing for memory. EX: Say we have a 2x GPU setup, where both are at 90% processor usage, and 75% memory usage:

2x GPU Nvidia 2GB
Processor: 180%   <-- Combined, just like the top command
Memory:    150%  <--- Combined.  Confusing?
IO:   2.9 GB/s   <-- Also combined, if available.

What do you think about combining stats like that? The alternative would be to show average stats on each line, or to have a 4 line GPU section for each card.

denru01 · 2016-12-25T02:16:26Z

>>> import pynvml
>>> pynvml.nvmlInit()
>>> d = self.get_device_handles()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'self' is not defined

nicolargo · 2016-12-25T20:19:03Z

@denru01 Sorry... here it is:

# python
>>> import pynvml
>>> pynvml.nvmlInit()
>>> d = get_device_handles()
>>> print(pynvml.nvmlDeviceGetUtilizationRates(d[0]).memory)
>>> mi = pynvml.nvmlDeviceGetMemoryInfo(d[0])
>>> print(mi.used)
>>> print(mi.total)
>>> exit()

denru01 · 2016-12-25T20:54:23Z

Still get the same error. I feel that get_device_handles() is not a function provided by pynvml, and is implemented in the glances plugin. I guess what you want to get and modify the code, and the result is put below:

>>> import pynvml
>>> pynvml.nvmlInit()
>>> device_handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in range(0, pynvml.nvmlDeviceGetCount())]
>>> print(pynvml.nvmlDeviceGetUtilizationRates(device_handles[0]).memory)
0
>>> for index, device_handle in enumerate(device_handles):
...     print(pynvml.nvmlDeviceGetUtilizationRates(device_handle).memory)
...     mi = pynvml.nvmlDeviceGetMemoryInfo(device_handle)
...     print(mi.used)
...     print(mi.total)
... 
0
2864709632
12781551616
2
4967104512
12781551616
0
2097152
8507555840
0
546177024
11995578368
0
546177024
11995578368
0
4967104512
12781551616
0
2097152
12781551616
0
2097152
12781551616
0
2097152
12781551616

nicolargo · 2016-12-26T08:38:22Z

Ok thanks @denru01 . Looks like nvmlDeviceGetUtilizationRates did not work as expected on your system...

The last HEAD version of the DEVELOP branch has been modified to only used the nvmlDeviceGetMemoryInfo method.

Can you try it ?

PS: if you send us one or more screenshots with Glances running the GPU plugin, it will be very kind.

asergi · 2016-12-26T18:19:04Z

If you got nvidia-ml-py installed, Glances won't start with python3:

Traceback (most recent call last):
  File "/home/alessio/venvs/py35-glances/bin/glances", line 11, in <module>
    load_entry_point('Glances', 'console_scripts', 'glances')()
  File "/home/alessio/venvs/py35-glances/src/glances/glances/__init__.py", line 225, in main
    start_standalone(config=config, args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/__init__.py", line 105, in start_standalone
    standalone = GlancesStandalone(config=config, args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/standalone.py", line 43, in __init__
    self.stats = GlancesStats(config=config, args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/stats.py", line 43, in __init__
    self.load_modules(self.args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/stats.py", line 77, in load_modules
    self.load_plugins(args=args)
  File "/home/alessio/venvs/py35-glances/src/glances/glances/stats.py", line 96, in load_plugins
    plugin = __import__(os.path.basename(item)[:-3])
  File "/home/alessio/venvs/py35-glances/src/glances/glances/plugins/glances_gpu.py", line 26, in <module>
    import pynvml
  File "/home/alessio/venvs/py35-glances/lib/python3.5/site-packages/pynvml.py", line 1671
    print c_count.value
                ^
SyntaxError: Missing parentheses in call to 'print'

It's almost 2017 and is a shame we imported a new plugin that depends on a python2-only library. -1.

denru01 · 2016-12-27T20:30:02Z

Hi,
It works great now. However, it would be better to reduce the number of digits for display (mem).

nicolargo · 2016-12-27T20:55:36Z

@denru01 Perfect. The digit issue is corrected in the HEAD version of the DEVELOP branch.

Can you post another screenshot with this latest version ?

denru01 · 2016-12-28T05:41:00Z

Here you go:P

nicolargo · 2016-12-29T16:26:08Z

Just improve the display (GPU name) and add plugin documentation: https://github.com/nicolargo/glances/blob/develop/docs/aoa/gpu.rst

@denru01 : A last test please ?

nicolargo · 2016-12-30T10:05:09Z

@notFloran Can you implement the Web UI ?

Here are some examples:

Mono GPU:

curl http://0.0.0.0:61208/api/2/gpu
[{"mem": null, "gpu_id": 0, "proc": 60, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}]
curl http://0.0.0.0:61208/api/2/gpu/views
{"0": {"mem": {"decoration": "DEFAULT"}, "proc": {"decoration": "CAREFUL"}}}

Multi GPU

curl http://0.0.0.0:61208/api/2/gpu
[{"mem": 48.64645, "gpu_id": 0, "proc": 60.73, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}, {"mem": 70.743, "gpu_id": 1, "proc": 80.28, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}, {"mem": 0, "gpu_id": 2, "proc": 0, "name": "GeForce GTX 560 Ti", "key": "gpu_id"}]
curl http://0.0.0.0:61208/api/2/gpu/views
{"0": {"mem": {"decoration": "OK"}, "proc": {"decoration": "CAREFUL"}}, "1": {"mem": {"decoration": "WARNING"}, "proc": {"decoration": "WARNING"}}, "2": {"mem": {"decoration": "OK"}, "proc": {"decoration": "OK"}}}

Thanks !

kdbanman · 2016-12-30T18:40:03Z

@asergi Thank you for reporting that python3 crash. The nvidia-ml-py library is supposed to support python 3. It was apparently ported long ago:

Version 2.285.0

...

Ported to support Python 3.0 and Python 2.0 syntax.

...

And we are now at 4.304.3, so I'm quite confident you found a bug in that library! I'm unsure where to report it though.

denru01 · 2016-12-30T21:24:40Z

@nicolargo Do you mean the HEAD of the develop branch? I did not see any difference in the GPU section, at least no GPU names.
I can try as many times as you want. However, because I am traveling, I cannot reply too fast.
Thanks.

nicolargo · 2016-12-30T21:29:36Z

@denru01 Yes. The last commit of the HEAD version of the DEVELOP branch is 8add69f . Check if you see it using the 'git log' command line.

notFloran · 2017-01-02T19:39:27Z

@nicolargo can you give me screenshots of the curse interface for the 2 examples ?

denru01 · 2017-01-02T19:57:08Z

@nicolargo yes. The version is correct.

nicolargo · 2017-01-02T20:01:51Z

Here it is:

[{"key": "gpu_id", "mem": None, "proc": 60, "gpu_id": 0, "name": "GeForce GTX 560 Ti"}]

Same interface with and without the 'meangpu' args:

[{"key": "gpu_id", "mem": 10, "proc": 60, "gpu_id": 0, "name": "GeForce GTX 560 Ti"}]

Same interface with and without the 'meangpu' args:

[{"key": "gpu_id", "mem": 48.64645, "proc": 60.73, "gpu_id": 0, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": 70.743, "proc": 80.28, "gpu_id": 1, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": 0, "proc": 0, "gpu_id": 2, "name": "GeForce GTX 560 Ti"}]

Without meangpu (default behavor):

With meangpu:

[{"key": "gpu_id", "mem": 48.64645, "proc": 60.73, "gpu_id": 0, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": None, "proc": 80.28, "gpu_id": 1, "name": "GeForce GTX 560 Ti"},
{"key": "gpu_id", "mem": 0, "proc": 0, "gpu_id": 2, "name": "ANOTHER GPU"}]

Without meangpu (default behavor):

With meangpu:

Thanks @notFloran !

soichih · 2019-08-08T19:55:22Z

How can I show GPU temperature on glances? Right now I have to run nvidia-smi separately to see the GPU temp.

mkanet · 2020-06-09T03:33:19Z

I had to install this:
pip install glances[gpu]

d-damien · 2024-06-16T14:28:20Z

So this nvidia thing is proprietary and this is why end user packages like gpustat are in Debian Multiverse, correct ? I there no way to talk with Nouveau, the OSS driver ?

nicolargo · 2024-06-16T16:36:35Z

@d-damien i do not think it is related because the nvidia-ml-py lib used by Glances (and also GPUstat) to grab GPU stats is open-sourced (BSD-License).

nicolargo modified the milestones: Next releases, Version 2.0 May 18, 2014

nicolargo changed the title ~~Demande de fct (si possible) : GPU~~ GPU monitoring Jan 30, 2015

nicolargo modified the milestones: Next releases, Version 2.8 Sep 11, 2016

notFloran mentioned this issue Jan 4, 2017

[WIP] Add GPU plugin in the WebUI #990

Merged

nclsHart mentioned this issue Jan 7, 2017

add gpu plugin shortcuts #992

Merged

nicolargo changed the title ~~GPU monitoring~~ GPU monitoring (NVidia only) Jan 8, 2017

nicolargo closed this as completed Jan 8, 2017

nicolargo changed the title ~~GPU monitoring (NVidia only)~~ GPU monitoring (NVidia) Jan 8, 2017

This was referenced Jan 8, 2017

GPU monitoring (AMD / ATI) #993

Closed

GPU monitoring (INTEL) #994

Open

asergi mentioned this issue Feb 7, 2017

Glances won't run after updating PIP Packages (The system cannot find the file specified) #1022

Closed

mholt mentioned this issue May 25, 2017

Error while initializing the gpu plugin: missing parentheses in pynvml.py #1097

Closed

wittypluck mentioned this issue Apr 2, 2024

Reformat GPU data for HA and add unit test home-assistant-ecosystem/python-glances-api#35

Merged

GPU monitoring (NVidia) #170

GPU monitoring (NVidia) #170

Comments

alexis26 commented Jan 28, 2013

nicolargo commented Jan 28, 2013

nicolargo commented Jan 29, 2013 • edited Loading

jrenner commented Mar 21, 2013

nicolargo commented Apr 14, 2013

herr-biber commented Jan 8, 2014

nicolargo commented Mar 8, 2014 • edited Loading

nicolargo commented Aug 2, 2014

nicolargo commented Aug 2, 2014

leinardi commented Nov 9, 2014

nicolargo commented Nov 9, 2014

leinardi commented Nov 9, 2014

nicolargo commented Jan 30, 2015

Bedotech commented Mar 24, 2016

kdbanman commented Sep 7, 2016

nicolargo commented Sep 7, 2016

kdbanman commented Sep 7, 2016

nicolargo commented Sep 8, 2016

kdbanman commented Sep 8, 2016 • edited Loading

drschwabe commented Sep 8, 2016

nicolargo commented Sep 8, 2016

kdbanman commented Sep 9, 2016

nicolargo commented Sep 9, 2016 • edited Loading

kdbanman commented Sep 12, 2016 • edited Loading

denru01 commented Dec 25, 2016 • edited Loading

nicolargo commented Dec 25, 2016

denru01 commented Dec 25, 2016 • edited Loading

nicolargo commented Dec 26, 2016 • edited Loading

asergi commented Dec 26, 2016

denru01 commented Dec 27, 2016

nicolargo commented Dec 27, 2016 • edited Loading

denru01 commented Dec 28, 2016

nicolargo commented Dec 29, 2016

nicolargo commented Dec 30, 2016

kdbanman commented Dec 30, 2016

denru01 commented Dec 30, 2016

nicolargo commented Dec 30, 2016

notFloran commented Jan 2, 2017

denru01 commented Jan 2, 2017

nicolargo commented Jan 2, 2017

soichih commented Aug 8, 2019

mkanet commented Jun 9, 2020

d-damien commented Jun 16, 2024

nicolargo commented Jun 16, 2024

nicolargo commented Jan 29, 2013 •

edited

Loading

nicolargo commented Mar 8, 2014 •

edited

Loading

kdbanman commented Sep 8, 2016 •

edited

Loading

nicolargo commented Sep 9, 2016 •

edited

Loading

kdbanman commented Sep 12, 2016 •

edited

Loading

denru01 commented Dec 25, 2016 •

edited

Loading

denru01 commented Dec 25, 2016 •

edited

Loading

nicolargo commented Dec 26, 2016 •

edited

Loading

nicolargo commented Dec 27, 2016 •

edited

Loading