Skip to content

[Native] Don't propagate errors in getResults#21481

Merged
xiaoxmeng merged 1 commit intoprestodb:masterfrom
kevinwilfong:get_results_error
Dec 5, 2023
Merged

[Native] Don't propagate errors in getResults#21481
xiaoxmeng merged 1 commit intoprestodb:masterfrom
kevinwilfong:get_results_error

Conversation

@kevinwilfong
Copy link
Copy Markdown
Contributor

Today, the native TaskManager will throw an exception if getResults is called on a failed or aborted task. In Presto Java, it looks like the Worker will return empty results rather than an error in these cases.

Looking at ClientBuffer in the Java code, it creates a future for getResults that will be fulfilled when the Task enqueues results. If the Task does not enqueue results, e.g. because it's failed or aborted, the future is fulfilled with empty results, either when getResults is called again or when the ClientBuffer is destroyed.

This behavior makes sense to me. By propagating the exception, there's a race condition in the Cooridinator resulting in the actual exception that caused the Task to fail, depending on whether the producer or consumer Task is marked as failed first in the Coordinator. If the Task is failed/aborted then the Coordinator should eventually realize this through the getTaskStatus call. Put another way, the purpose of getResults is to propagate data, and the purpose of getTaskStatus is to propagate errors.

@kevinwilfong kevinwilfong requested a review from a team as a code owner December 4, 2023 23:00
Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinwilfong Thank you for tracking down this issue. It would be nice to update the documentation at https://prestodb.io/docs/current/develop/worker-protocol.html

Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, document this behavior in the comments for TaskManager::getResults API.

Today, the native TaskManager will throw an exception if
getResults is called on a failed or aborted task. In Presto Java,
it looks like the Worker will return empty results rather than an
error in these cases.

Looking at ClientBuffer in the Java code, it creates a future for
getResults that will be fulfilled when the Task enqueues results.
If the Task does not enqueue results, e.g. because it's failed or
aborted, the future is fulfilled with empty results, either when
getResults is called again or when the ClientBuffer is destroyed.

This behavior makes sense to me.  By propagating the exception,
there's a race condition in the Cooridinator resulting in the
actual exception that caused the Task to fail, depending on
whether the producer or consumer Task is marked as failed first
in the Coordinator. If the Task is failed/aborted then the
Coordinator should eventually realize this through the
getTaskStatus call. Put another way, the purpose of getResults is
to propagate data, and the purpose of getTaskStatus is to
propagate errors.
@kevinwilfong kevinwilfong requested a review from a team as a code owner December 5, 2023 00:27
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 76ae3ed...2bb472d.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/develop/worker-protocol.rst

Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@mbasmanova mbasmanova requested a review from arhimondr December 5, 2023 00:33
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Local build of docs, looks good. Thanks!

Copy link
Copy Markdown
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinwilfong thanks!

@xiaoxmeng xiaoxmeng merged commit d2e4e2f into prestodb:master Dec 5, 2023
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants