Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

Open
abhiskk opened this issue Nov 23, 2022 · 7 comments · May be fixed by #2965
Open

Add an option to exit after the submitit launcher has scheduled the run on slurm #2479

abhiskk opened this issue Nov 23, 2022 · 7 comments · May be fixed by #2965
Labels
enhancement Enhanvement request internal
Milestone

Comments

@abhiskk
Copy link

abhiskk commented Nov 23, 2022

🚀 Feature Request

Currently, when we execute the launcher command, the job is scheduled but the train process is checking the job status without exiting. One limitation is that we can not execute several scheduling commands in a loop. This behavior is coming from a wait command running here. It would be very helpful if we can make this configurable so that the launcher can exit without continuously waiting for the run results.

@abhiskk abhiskk added the enhancement Enhanvement request label Nov 23, 2022
@Jasha10 Jasha10 added this to the Hydra 1.3.0 milestone Nov 25, 2022
@Jasha10
Copy link
Collaborator

Jasha10 commented Nov 26, 2022

Hi @abhiskk,

I've looked into disabling waiting for the jobs to return, and I've run into a problem: Hydra's sweepers plugins expect the launcher to return information about the launched job (including the return value).

For example, you'll see here that Hydra's BasicSweeper collects the returned values from each job in the sweep. An exception is raised if the job does not return a value. The same is true of Hydra's other sweeper plugins.

I think making this work in a clean way would require changes to Hydra's sweeper API.

One limitation is that we can not execute several scheduling commands in a loop.

Maybe there's another way we can work around this limitation. Could you please share an example of how you're running the for loop? Is it a loop in python or in bash? Is there a reason you're using a for loop instead of using a hydra sweep?

@Jasha10
Copy link
Collaborator

Jasha10 commented Nov 26, 2022

If there's no other workaround, we can add an option to have the submitit launcher use a dummy return value (e.g. return None).

@mmcdermott
Copy link

Any update on this? It would be very uesful.

@pipme
Copy link

pipme commented Aug 18, 2023

+1. This would be very useful. I am using an HPC and need to launch experiments on a login node via hydra and hydra_submitit_launcher. It would be very convenient to exit the process for launching once the actual jobs are sent to the computing nodes.

@odelalleau
Copy link
Collaborator

A simple workaround is to launch your command in the background with & (possibly combined with nohup if you're not using some kind of persistent shell like tmux or screen).

@Xiang-Pan
Copy link

It is not a good idea, since if you are submitting 400 jobs, it will exceed the thread limit of some servers.

But if you only submit 10 jobs, it is fine.

@OWissett
Copy link

OWissett commented Oct 4, 2024

Any progress on this?

OWissett added a commit to OWissett/hydra that referenced this issue Oct 4, 2024
OWissett added a commit to OWissett/hydra that referenced this issue Oct 4, 2024
@OWissett OWissett linked a pull request Oct 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhanvement request internal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants