Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow fails if nested task fails, even if it succeeds on retry #14390

Open
4 tasks done
thundercat1 opened this issue Jun 27, 2024 · 2 comments · May be fixed by #14439
Open
4 tasks done

Flow fails if nested task fails, even if it succeeds on retry #14390

thundercat1 opened this issue Jun 27, 2024 · 2 comments · May be fixed by #14439
Assignees
Labels
bug Something isn't working

Comments

@thundercat1
Copy link

thundercat1 commented Jun 27, 2024

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

In the following flow, I'd expect the tasks to fail once, but then the caller top_task should retry, and everything should be successful on the second try. However, the overall flow state is marked as Failed despite the tasks eventually succeeding.

Reproduction

from prefect import flow, task

failed = False

@task
def nested_flaky_task():
    # This task will fail the first time it is run, but will succeed if called a second time
    global failed
    if not failed:
        failed = True
        raise ValueError("Forced task failure")

@task(
    retries=1,
)
def top_task():
    nested_flaky_task()


@flow
def nested_task_flow():
    top_task()


if __name__ == "__main__":
    nested_task_flow()

Error

15:47:30.527 | INFO    | Task run 'top_task-0' - Received non-final state 'AwaitingRetry' when proposing final state 'Failed' and will attempt to run again...
15:47:30.572 | INFO    | Task run 'top_task-0' - Created task run 'nested_flaky_task-1' for task 'nested_flaky_task'
15:47:30.573 | INFO    | Task run 'top_task-0' - Executing 'nested_flaky_task-1' immediately...
15:47:30.613 | INFO    | Task run 'nested_flaky_task-1' - Finished in state Completed()
15:47:30.627 | INFO    | Task run 'top_task-0' - Finished in state Completed()
15:47:30.645 | ERROR   | Flow run 'berserk-loon' - Finished in state Failed('1/3 states failed.')

Versions (prefect version output)

Version:             2.19.6
API version:         0.8.4
Python version:      3.11.6
Git commit:          9d938fe7
Built:               Mon, Jun 24, 2024 10:23 AM
OS/Arch:             darwin/arm64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.43.2

Additional context

Workaround here is pretty straightforward - either add retries to the flaky task, or remove the task decorator. So it isn't a huge blocker to being able to build effective flows. But, it's confusing and hard to retroactively figure out what happened when this happens in the context of a big flow - behavior more in line with expectations could prevent some debugging headaches.

@thundercat1 thundercat1 added bug Something isn't working needs:triage Needs feedback from the Prefect product team labels Jun 27, 2024
@WillRaphaelson
Copy link
Contributor

Thanks @thundercat1 - it does seem that we are not resolving the nested success on retry back up the chain, we can fix this.

@zhen0 zhen0 removed the needs:triage Needs feedback from the Prefect product team label Jun 29, 2024
@serinamarie serinamarie self-assigned this Jul 1, 2024
@serinamarie
Copy link
Contributor

Hi @thundercat1, I put up a PR that I hope will resolve this issue. Can you let me know if it does?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants