Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job mem returns as "None" #323

Open
robgics opened this issue Sep 18, 2023 · 3 comments
Open

job mem returns as "None" #323

robgics opened this issue Sep 18, 2023 · 3 comments

Comments

@robgics
Copy link
Contributor

robgics commented Sep 18, 2023

Details

  • Slurm Version: 23.02.4
  • Python Version: 3.6.8
  • Cython Version: 3.0.0
  • PySlurm Branch: main
  • Linux Distribution: RHEL 8.8

Issue

Code that processes the memory of jobs was crashing due to a None being included.

I identified that this job (scontrol output) was causing pyslurm to give a value of "None" for job.memory.

JobId=5909485 JobName=interactive
UserId=xvy5180(5712646) GroupId=xvy5180(5712646) MCS_label=N/A
Priority=300000 Nice=0 Account=mxs2589_e_gpu QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=05:12:33 TimeLimit=10:00:00 TimeMin=N/A
SubmitTime=2023-09-18T10:57:13 EligibleTime=2023-09-18T10:57:13
AccrueTime=2023-09-18T10:57:13
StartTime=2023-09-18T10:57:31 EndTime=2023-09-18T20:57:32 Deadline=N/A
PreemptEligibleTime=2023-09-18T10:57:31 PreemptTime=None
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-09-18T10:57:31 Scheduler=Main
Partition=sla-prio AllocNode:Sid=submit02:526250
ReqNodeList=(null) ExcNodeList=(null)
NodeList=p-gc-3003
BatchHost=p-gc-3003
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0::
ReqTRES=cpu=1,mem=79488M,node=1,billing=1,gres/gpu=1
AllocTRES=cpu=1,mem=79488M,node=1,billing=1,gres/gpu=1
Socks/Node=* NtasksPerN:B:S:C=0:0:: CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=gc DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/bin/sh
Power=
MemPerTres=gres:gpu:79488
TresPerJob=gres:gpu:1

@tazend
Copy link
Member

tazend commented Sep 21, 2023

Hi,

yeah, due to no memory specification for the Job (MinMemoryNode=0), the current values that are checked in the memory property are all set to 0 (which means UNLIMITED/INFINITE in Slurm), which is then further translated to None.

I can try and extract the values that are shown in either ReqTRES/AllocTRES for mem and return that instead when the other values are set to 0. However, I'm not sure if it is always guaranteed that ReqTRES/AllocTRES hold a value for mem. To be absolutely safe, one has to handle the None case. Or instead of None, I could also return the UNLIMITED constant

@robgics
Copy link
Contributor Author

robgics commented Sep 22, 2023

Hmm....that's an interesting one. I would think UNLIMITED would be more accurate than Python None, because it's intentional. I assumed an error was causing it to be None, but I'd know that wasn't the case if it was 'UNLIMITED'.

So then in that case, to try to come up with a measure of how much memory the job has requested, I'd have to take all of the total memory values for all nodes the job has requested and add them together....right? I mean, it's asking for all memory on all nodes it's using, right?

@robgics
Copy link
Contributor Author

robgics commented Sep 22, 2023

I see what you're saying about AllocTres memory values. I guess if I receive an "UNLIMITED" for memory, then I could try to parse those values to come up with a memory amount. I don't think pyslurm should be burdened with that....as the actual mem value was indeed "UNLIMITED".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants