Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: When restarting a failed task, the task completes but state of job is stuck in running. #115

Closed
bwilliams42 opened this issue Sep 3, 2014 · 9 comments · Fixed by #116

Comments

@bwilliams42
Copy link

i also think the job should continue from where it left off. After the failed task is re-ran, it should move onto the next task(s)

@bwilliams42 bwilliams42 changed the title When restarting a failed task, the task completes but state of job is stuck in running. BUG: When restarting a failed task, the task completes but state of job is stuck in running. Sep 3, 2014
@thieman
Copy link
Owner

thieman commented Sep 3, 2014

Can you provide a minimal test case here? The job export functionality can be useful for that. Does this involve any remote tasks?

@bwilliams42
Copy link
Author

@thieman. Sure. If you create a job that has say 5 tasks. 1 > 2 > 3 > 4 > 5. You purposely make number 3 fail. The execution stops there. 4 and 5 are not executed. You then fix 3 and hit the re-run failed tasks button, 3 will complete successfully. The state of the overall job remains as running. So I can't restart the the job using the "start job from beginning" button. This is definitely a bug. I need to restart the server to force it back to a normal state.

There are no remote tasks. How would the export help?

-B

@thieman
Copy link
Owner

thieman commented Sep 3, 2014

Sorry, was just suggesting export as a way of passing along a test job. Will take a look.

@bwilliams42
Copy link
Author

Ah i see. I can whip one up and send it over a little later. It will be later on tonight/tomorrow. Is that OK?

@thieman
Copy link
Owner

thieman commented Sep 3, 2014

Sure. Sounds simple enough that it may not be necessary, but will be helpful if I encounter issues.

@nnfuzzy
Copy link

nnfuzzy commented Sep 4, 2014

I experienced the same. Here are simple templates.
https://github.com/nnfuzzy/dagobah/tree/master/tests

@rclough
Copy link
Collaborator

rclough commented Sep 26, 2014

I believe the issue arrived with one of my more recent PRs, where we now are operating on a copy of the graph. When a task fails, the DAG snapshot gets deleted. When you retry, the snapshot is not reconstructed, and thus after the task finishes, downstream() doesnt operate as expected because snapshot is None

My initial inclination is just to rebuild the snapshot within the retry() method. I'm trying to think if there are any edge cases in which we would want to preserve the snapshot on failure (especially since fixing a failure may mean changing the the DAG).

@thieman
Copy link
Owner

thieman commented Sep 26, 2014

Should be corrected now, going to push out a release shortly.

@thieman
Copy link
Owner

thieman commented Sep 26, 2014

v0.3.1 is live on PyPi with this fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants