Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to access the GRES_IDX attribute for a job #116

Closed
chrissamuel opened this issue Jul 3, 2018 · 9 comments
Closed

Provide a way to access the GRES_IDX attribute for a job #116

chrissamuel opened this issue Jul 3, 2018 · 9 comments

Comments

@chrissamuel
Copy link

Details

  • Slurm Version: 17.11.7
  • Python Version: 3.4 (RHEL7)
  • Cython Version: ??
  • PySlurm Branch: 17.11.0
  • Linux Distribution: RHEL7

Issue

It would be really useful to us for our job monitoring program to be able to get the GRES_IDX information from running jobs so we can highlight the GPU the job is using for our users so they can see how much utilisation it is getting.

Our issue for this is: conradtchan/jobmon#1

This looks like GRES_IDX=gpu(IDX:0) or GRES_IDX=gpu(IDX:1) or GRES_IDX=gpu(IDX:0-1) for our systems with dual GPUs. I'm not sure what it would look like for a system with say 4 GPUs if the allocation is not contiguous.

Thanks for considering this!

All the best,
Chris

@giovtorres
Copy link
Member

Thanks. I was hoping this one would be easy. Here is the code that needs to get wrapped: https://github.com/SchedMD/slurm/blob/5073024350eb79c8c5a9964e800bc0ce3ab93d59/src/api/job_info.c#L706-L803.

That may take me some time.

How would you like the output of this attribute, as one string that matches the scontrol output, or a list of strings if there are more than one?

@chrissamuel
Copy link
Author

Sorry for not spotting this before, I've just passed your query on to the developer!

@chrissamuel
Copy link
Author

chrissamuel commented Aug 6, 2018

Got a reply from the developer today, he said:

Sorry for the late reply... a list of strings would be great!

Hope that helps!
Chris

@giovtorres
Copy link
Member

Hi @chrissamuel. I know it's been a while, but I think I got the code wrapped to get GRES_IDX in one of the 18.08 branches I'm working on. I should be able to backport to an older branch. Are you still on Slurm 17.11.7?

@chrissamuel
Copy link
Author

Hi @giovtorres!

Swinburne is on 18.08.x now, but I've since moved to the US for love and for work and am now at NERSC (still doing HPC). But I'll still see updates here and let them know.

Thanks for this!
Chris

@giovtorres
Copy link
Member

The gres_idx branch should work with a later version of Cython. I'll merge it into master soon after I figure out why it is failing on older Cython versions.

@chrissamuel
Copy link
Author

Thanks! Passed that back to them.

@tazend
Copy link
Member

tazend commented Jan 3, 2025

In the new API, starting from pyslurm 21.8.x, this has been implemented:

https://pyslurm.github.io/24.11/reference/job/#pyslurm.Job.get_resource_layout_per_node

It will look something like this:

{
   'node015': 
       {
         'cpu_ids': '0',
         'gres':
            {
              'gpu:tesla-k80':
                  {
                   'count': 1,
                   'indexes': '0'
                  }
            },
        'memory': 4096
      }
}

The keys of the dict are the node-names, and the values are another dict containing cpu_ids, gres and memory for this node in use.
This return type might change in the future as stated in the docs, but this should work for now.

@tazend tazend closed this as completed Jan 3, 2025
@chrissamuel
Copy link
Author

Hi @tazend - thanks so much! I've passed this on to the folks who were after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants