-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: When restarting a failed task, the task completes but state of job is stuck in running. #115
Comments
Can you provide a minimal test case here? The job export functionality can be useful for that. Does this involve any remote tasks? |
@thieman. Sure. If you create a job that has say 5 tasks. 1 > 2 > 3 > 4 > 5. You purposely make number 3 fail. The execution stops there. 4 and 5 are not executed. You then fix 3 and hit the re-run failed tasks button, 3 will complete successfully. The state of the overall job remains as running. So I can't restart the the job using the "start job from beginning" button. This is definitely a bug. I need to restart the server to force it back to a normal state. There are no remote tasks. How would the export help? -B |
Sorry, was just suggesting export as a way of passing along a test job. Will take a look. |
Ah i see. I can whip one up and send it over a little later. It will be later on tonight/tomorrow. Is that OK? |
Sure. Sounds simple enough that it may not be necessary, but will be helpful if I encounter issues. |
I experienced the same. Here are simple templates. |
I believe the issue arrived with one of my more recent PRs, where we now are operating on a copy of the graph. When a task fails, the DAG snapshot gets deleted. When you retry, the snapshot is not reconstructed, and thus after the task finishes, downstream() doesnt operate as expected because snapshot is My initial inclination is just to rebuild the snapshot within the retry() method. I'm trying to think if there are any edge cases in which we would want to preserve the snapshot on failure (especially since fixing a failure may mean changing the the DAG). |
Should be corrected now, going to push out a release shortly. |
v0.3.1 is live on PyPi with this fix |
i also think the job should continue from where it left off. After the failed task is re-ran, it should move onto the next task(s)
The text was updated successfully, but these errors were encountered: