Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

abhiskk · 2022-11-23T22:42:01Z

🚀 Feature Request

Currently, when we execute the launcher command, the job is scheduled but the train process is checking the job status without exiting. One limitation is that we can not execute several scheduling commands in a loop. This behavior is coming from a wait command running here. It would be very helpful if we can make this configurable so that the launcher can exit without continuously waiting for the run results.

Jasha10 · 2022-11-26T02:50:52Z

Hi @abhiskk,

I've looked into disabling waiting for the jobs to return, and I've run into a problem: Hydra's sweepers plugins expect the launcher to return information about the launched job (including the return value).

For example, you'll see here that Hydra's BasicSweeper collects the returned values from each job in the sweep. An exception is raised if the job does not return a value. The same is true of Hydra's other sweeper plugins.

I think making this work in a clean way would require changes to Hydra's sweeper API.

One limitation is that we can not execute several scheduling commands in a loop.

Maybe there's another way we can work around this limitation. Could you please share an example of how you're running the for loop? Is it a loop in python or in bash? Is there a reason you're using a for loop instead of using a hydra sweep?

Jasha10 · 2022-11-26T02:54:46Z

If there's no other workaround, we can add an option to have the submitit launcher use a dummy return value (e.g. return None).

mmcdermott · 2023-05-23T21:12:20Z

Any update on this? It would be very uesful.

pipme · 2023-08-18T12:14:22Z

+1. This would be very useful. I am using an HPC and need to launch experiments on a login node via hydra and hydra_submitit_launcher. It would be very convenient to exit the process for launching once the actual jobs are sent to the computing nodes.

odelalleau · 2023-08-18T12:23:30Z

A simple workaround is to launch your command in the background with & (possibly combined with nohup if you're not using some kind of persistent shell like tmux or screen).

Xiang-Pan · 2023-12-27T06:50:08Z

It is not a good idea, since if you are submitting 400 jobs, it will exceed the thread limit of some servers.

But if you only submit 10 jobs, it is fine.

OWissett · 2024-10-04T17:09:35Z

Any progress on this?

)

…research#2479)

)

abhiskk added the enhancement Enhanvement request label Nov 23, 2022

Jasha10 added the internal label Nov 25, 2022

Jasha10 added this to the Hydra 1.3.0 milestone Nov 25, 2022

pipme mentioned this issue Aug 22, 2023

Hydra-submitit launcher does not log error messages to log files. #2100

Closed

OWissett added a commit to OWissett/hydra that referenced this issue Oct 4, 2024

Adds non-blocking mode to hydra_submitit_launcher (facebookresearch#2479

977871c

)

OWissett added a commit to OWissett/hydra that referenced this issue Oct 4, 2024

Adds sentinel return value to prevent sweeper from crashing (facebook…

3f2893e

…research#2479)

OWissett added a commit to OWissett/hydra that referenced this issue Oct 4, 2024

Adds non-blocking mode to hydra_submitit_launcher (facebookresearch#2479

a867834

)

OWissett linked a pull request Oct 4, 2024 that will close this issue

No block slurm #2965

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

abhiskk commented Nov 23, 2022

Jasha10 commented Nov 26, 2022

Jasha10 commented Nov 26, 2022

mmcdermott commented May 23, 2023

pipme commented Aug 18, 2023 •

edited

Loading

odelalleau commented Aug 18, 2023

Xiang-Pan commented Dec 27, 2023

OWissett commented Oct 4, 2024

Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

Comments

abhiskk commented Nov 23, 2022

🚀 Feature Request

Jasha10 commented Nov 26, 2022

Jasha10 commented Nov 26, 2022

mmcdermott commented May 23, 2023

pipme commented Aug 18, 2023 • edited Loading

odelalleau commented Aug 18, 2023

Xiang-Pan commented Dec 27, 2023

OWissett commented Oct 4, 2024

pipme commented Aug 18, 2023 •

edited

Loading