Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The job runner does not understand cancelled jobs #531

Open
sbidoul opened this issue May 1, 2023 · 7 comments
Open

The job runner does not understand cancelled jobs #531

sbidoul opened this issue May 1, 2023 · 7 comments
Labels
bug no stale Use this label to prevent the automated stale action from closing this PR/Issue.

Comments

@sbidoul
Copy link
Member

sbidoul commented May 1, 2023

Module

queue_job

Describe the bug

Since #347, jobs have a cancelled state.
However the job runner does not understand this state.

# state transitions
if not state or state == DONE:
job.channel.set_done(job)
elif state == PENDING:
job.channel.set_pending(job)
elif state in (ENQUEUED, STARTED):
job.channel.set_running(job)
elif state == FAILED:
job.channel.set_failed(job)
elif state == WAIT_DEPENDENCIES:
# wait until all parent jobs are done
pass
else:
_logger.error("unexpected state %s for job %s", state, job)

This may lead to cancelled job filling up channels forever.

@sbidoul sbidoul added the bug label May 1, 2023
@sbidoul
Copy link
Member Author

sbidoul commented May 1, 2023

@hbrunn @hparfr as you seem to use this feature, have you not noticed problems with it?

I still need to investigate in more depth but I'm under the impression that cancelled jobs fill up channels until the job runner is restarted.

@hparfr
Copy link
Contributor

hparfr commented May 2, 2023

I didn't pay attention to this one.
I have canceled jobs with an error message in one of my db 🙄

@sbidoul
Copy link
Member Author

sbidoul commented May 2, 2023

Looking at places where DONE is used I notice the job dependencies mechanism. If I read correctly cancelled jobs cause their dependencies to stay WAIT_FOR_DEPENDENCIES until they are themselves cancelled or marked done. I guess that makes sense?

@sbidoul
Copy link
Member Author

sbidoul commented May 2, 2023

I propose a fix for the jobrunner in #533

@sbidoul
Copy link
Member Author

sbidoul commented May 3, 2023

@guewen if you have a moment, I'll welcome your advice on the effect of the cancelled states on the job dependencies mechanism. Maybe it's fine but I'm not entirely sure it was taken into account when porting the cancelled state feature from 12 to 14.

@guewen
Copy link
Member

guewen commented May 3, 2023

Hmm yeah I'm not sure the current situation is alright. Cancelling a parent job leaves a situation where dependent jobs will never be executed I think. We have to cancel them or set them to done, which is ultimately the same. I wonder if cancelling a job in a graph shouldn't cancel all the dependent jobs or maybe even all the graph's jobs (this can be explained in the cancel wizard)?

Copy link

github-actions bot commented Nov 5, 2023

There hasn't been any activity on this issue in the past 6 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this issue to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Nov 5, 2023
@sbidoul sbidoul added no stale Use this label to prevent the automated stale action from closing this PR/Issue. and removed stale PR/Issue without recent activity, it'll be soon closed automatically. labels Dec 10, 2023
@sbidoul sbidoul reopened this Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug no stale Use this label to prevent the automated stale action from closing this PR/Issue.
Projects
None yet
Development

No branches or pull requests

3 participants