-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submitting batch job fails randomly with broken paths #260
Comments
Hi, could you try out the slurm-20.11.8 branch? (https://github.com/PySlurm/pyslurm/tree/slurm-20.11.8) So you submitted the jobs from the directory /correct/work/dir, right? Does |
The strings are also broken in
|
I switched to branch
|
Looking at the byte data you can see the reoccurring byte pattern I suspect this issue will be localised to the |
Hi, yeah culprit is definitely The encoding step itself should be fine, however it has likely to do with the lifetime of the char* pointer for
This itself is fine, however this code is in a different function than the one actually submitting the job. By the time the function (fill_job_desc_from_opts) which contains this code is done, You won't see this behaviour though when you explicitly specify the work_dir - the python object will live long enough since it is in the Anyway, in this case a quick fix in the code would be to modify the incoming The long-term fix would be to restructure the job API in a way that things like these can't happen anymore (working on it) The |
The problem still persist if
|
Oh,
|
Mh weird, I can replicate the erroneous symbols if I don't supply a import pyslurm; psj = pyslurm.job() ; jid = psj.submit_batch_job({'wrap': 'sleep 5', 'work_dir': '/my/work/dir', 'get_user_env_time': -1}) ; job = psj.find_id(jid)[0] ; print(jid, job['job_state'], job['work_dir']) mh - wondering why its not working for you with that. (I'm on 22.05, though it is still the same code in pyslurm) |
Starting from Slurm 21.8, the Job submission API has been heavily reworked, where such errors are fixed. The old pyslurm.job class in pyslurm.pyx is no longer supported. The new class for Job-Submission is pyslurm.JobSubmitDescrition. Documentation can be found here: https://pyslurm.github.io/23.2/reference/jobsubmitdescription/ Slurm 20.X versions are too old to justify the time-invest needed to backport the new API. Since 20.X is an old version anyway that SchedMD doesn't support anymore, a newer version of Slurm should be used. |
Details
Issue
Submitting jobs (both via
script
orwrap
) fails randomly. An immediate indicator is that thework_dir
(and other paths likestd_out
andstd_err
) are broken strings on those cases:For a failing job:
Any idea what is going wrong?
The text was updated successfully, but these errors were encountered: