Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for realtime statistics #347

Closed
conradtchan opened this issue Sep 24, 2024 · 8 comments · Fixed by #355
Closed

Support for realtime statistics #347

conradtchan opened this issue Sep 24, 2024 · 8 comments · Fixed by #355
Assignees

Comments

@conradtchan
Copy link

We're interested in using PySlurm to gather realtime statistics on jobs, using the TRESUsageInTot and TRESUsageInAve fields. These are available in the current database API (pyslurm.db.JobStatistics) but those are only available after the job has finished.

Is it currently possible or are there plans to get these in realtime? i.e. the equivalent of calling sstat vs sacct.

Thanks!

@tazend
Copy link
Member

tazend commented Oct 1, 2024

Hi @conradtchan

that is something I've been wanting to implement for quite some time. However, although it is fairly easy to fetch the statistics, the data type one is getting back is opaque. Which is itself not that dramatic, they have functions that can parse this data type to a known type (the same that is used under the hood for pyslurm.db.JobStatistics). However, these functions are not exported in the public API for Slurm (libslurm.so), which pyslurm links to at the moment. They are only available in libslurmfull.so, which is more a library for internal usage in Slurm

Of course we could just link with libslurmfull and then everything works, but not sure if its the best move. In practice, even if libslurmfull is more thought to be an internal library, it doesn't really do any harm to use it directly.

Anyway - I had a look at the slurm code again and it might also work without using their parser functions - its just a bit more work to get to the data we care about. Of course it would also be possible to wrap the sstat command, but I don't really like that.

I cannot promise anything, but when I have time again I will give it a shot and try to implement something.

@conradtchan
Copy link
Author

Thanks for the explanation! I understand the hesitation around linking with libslurmfull.

Appreciate you taking a look and thinking through the options.

@tazend
Copy link
Member

tazend commented Oct 14, 2024

Hi @conradtchan

I implemented this now, a working version of it is in this branch for Slurm 24.05.
It was only possible to do this with libslurmfull, because they really do not export any of the relevant functions to libslurm. But that is fine, as everythings works like before, and it can even make some other things easier to implement.

I just need to make a few more adjustments and tests + documentation before it is getting merged.

@conradtchan
Copy link
Author

Thank you for implementing this!

I tested it out by cherrypicking that branch onto 23.11.x which is what we're currently running. Took me a little while to figure out that I had to do .load_steps() and .load_stats(), but it works 😀

@tazend
Copy link
Member

tazend commented Oct 17, 2024

Glad it works :)
Ah yes good hint, I actually forgot to call .load_steps() implicitly inside .load_stats() - I'll change that so you only need to call .load_stats(), and not both.

@tazend tazend self-assigned this Dec 12, 2024
@tazend
Copy link
Member

tazend commented Dec 12, 2024

I have reworked a few things, and will hopefully merge the changes soon to main

@tazend
Copy link
Member

tazend commented Dec 17, 2024

Has now been merged into main, with a few substantial changes: Job and Steps now have seperate Statistic classes, pyslurm.db.JobStatistics and pyslurm.db.JobStepStatistics respectively.

I will update the documentation shortly.

@conradtchan
Copy link
Author

Thank you for all of your help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants