-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add plugin for monitoring GPU usage per user #947
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for such a nice awk plugin :)
Your plugin differs from the nvidia_gpu_
plugin at least by structure (using multigraph - yeah!) and probably also by content. Could you please describe the differences in the documentation header (e.g. USAGE
)?
For bonus karma: maybe you want to upload example graphs below example-graphs/PLUGIN_NAME-day.png
?
plugins/gpu/nvidia_gpu_by_user
Outdated
fi | ||
fi | ||
|
||
gpuUSERS=$(clean_fieldname "$(ls /home)" | tr "\n" " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels slightly dirty to me, since it uses the non-documented behaviour of clean_fieldname
with regard to multi-line input. Please use something like the following instead:
find /home -maxdepth 1 -mindepth 1 -type d | while read -r dname; do clean_fieldname "$(basename "$dname")"; done
Btw.: is the underlying assumption (relevant users having a home directory below /home/
that matches their name) a good choice?
How about the following - in order to get at least around the directory/username match requirement?
getent passwd | cut -d : -f 1,6 | grep ":/home/" | cut -d : -f 1
(I cannot judge its portability - maybe you can comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some environments "getent passwd" will not return users in LDAP (or even YP/NIS):
$ getent passwd | grep niclangf
$ getent passwd niclangf
niclangf:*:7181:10001:niclangf:/home/niclangf:/bin/bash
and in some environments "getent passwd" will return many many thousands of users.
What you find in /home is also very dodgy. Home directories can be mounted many other places in large environments.
So getting a "full catalog" of users is not very reliable or scalable. But if you have either a username or a UID this works
$ getent passwd niclangf
niclangf:*:7181:10001:niclangf:/home/niclangf:/bin/bash
$ getent passwd 7181
niclangf:*:7181:10001:niclangf:/home/niclangf:/bin/bash
If you want to provide a stable list of users to keep the graphs stable then persist the usernames... but I see I never got around to writing persistence functions for plugin.sh... There are good hints in Munin/Plugin.pm about what environment variables and such to use if you feel wizardly and want to write a file to persist something at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what could be a good and portable approach for requesting "all non-system users" (I think, this is the goal)?
I have the feeling, that we cannot provide a really good and platform neutral implementation with the little effort that is justified for this task in this context. Thus I would be happy with anything that does not feel too dirty (maybe even the original proposal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't. Just graph for those that have been using the GPU and somehow persist it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for good comments.
If there isn't suitable implementation for listing all non-system users,
then I think it is still good for users to be configurable.
The cpubyuser plugin has similar implementation.
plugins/gpu/nvidia_gpu_by_user
Outdated
} | ||
return j; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove trailing whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise excellent plugin!
plugins/gpu/nvidia_gpu_by_user
Outdated
END { | ||
if (n < 0) { | ||
|
||
print "No NVIDIA GPUs detected. Exiting." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please print this to stderr and exit with an error.
plugins/gpu/nvidia_gpu_by_user
Outdated
|
||
=head1 NAME | ||
|
||
gpubyuser - Plugin to monitor GPU memory usage by user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention, this is the full name of the plugin file.
I've modified some parts based on your feedback. |
It looks good to me, now! Just one question:
Currently
|
Thank you for nice suggestions!
It was not correct. How about this comment below? If env.gpuusers is set, graph always shows listed users
For now, no suitable/portable method is found to list all non-system users automatically, |
Are you sure, that the plugin currently really does this?
That sounds reasonable. |
Maybe you want to comment on my last question? Thank you! |
@htakano: ping? |
Stale pull request message |
Ping? |
It's similar to nvidia_gpu_, but shows each users' GPU usage graph.