Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submittion of a batch job will be failed when argument "work_dir" contains a "_" #294

Closed
Baohua-Chen opened this issue May 20, 2023 · 2 comments
Labels
wontfix Problem that will not be fixed due to various reasons.

Comments

@Baohua-Chen
Copy link

Baohua-Chen commented May 20, 2023

Details

  • Slurm Version: 20.02.5
  • Python Version: 3.10.8
  • Cython Version: 0.29.34
  • PySlurm Branch: 20-02-5
  • Linux Distribution: Linux version 3.10.0-1160.el7.x86_64 CentOS Linux release 7.9.2009

UPDATE

This bug seems not only caused by values of slurm_job dict. I have got the same error when deleted the "work_dir" from the dict.
Maybe it's something to submit job in Jupyter Lab? I do not know.

Issue

When attempting to submit a batch job using the job().submit_batch_job function and specifying a "work_dir" key with values containing underscores (_), the job gets submitted but immediately fails. Upon checking the submitted job using the job().find_id function, I discovered that the "work_dir" attribute was encoded as garbled text such as "wly�U". However, when I resubmitted the job with the underscores removed from the work_dir`, the issue did not reoccur. I suspect this might be due to replacing "_" by "-" when call the SLURM interface.

An example which reproduces this bug:
Job1 = {'wrap': 'echo a;sleep 15; echo b, 'job_name': 'test', 'partition': 'all', 'ntasks': 1, 'cpus_per_task': 1, 'work_dir': '/home/boo/slurm_jobs'}
job().submit_batch_job(Job1)

And an example which works well:
Job2 = {'wrap': 'echo a;sleep 15; echo b, 'job_name': 'test', 'partition': 'all', 'ntasks': 1, 'cpus_per_task': 1, 'work_dir': '/home/boo/slurmjobs'}
job().submit_batch_job(Job2)

@tazend
Copy link
Member

tazend commented May 20, 2023

Hi

you are probably seeing a similar issue as mentioned in #260

In newer versions of pyslurm (starting with 21.08), the Job-Submission API was substantially reworked (see the docs here), and the pyslurm.job class has been declared deprecated.

Since that new API is not available for 20.2 yet, I can try to backport it. But it may take some time due to potential changes that have been introduced over the years in newer slurm versions.

@tazend
Copy link
Member

tazend commented Dec 12, 2024

Slurm version 20.2 is too old for the new API changes mentioned to be backported right now, and the time-invest is too high. The job-submission code used (pyslurm.job class) here is deprecated and replaced by the new API (pyslurm.JobSubmitDescription).

If you upgraded to a more recent slurm version, you can use the new API. But at the moment, the old code won't be fixed for this old slurm version

@tazend tazend added the wontfix Problem that will not be fixed due to various reasons. label Dec 12, 2024
@tazend tazend closed this as completed Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix Problem that will not be fixed due to various reasons.
Projects
None yet
Development

No branches or pull requests

2 participants