[Native] Don't propagate errors in getResults#21481
Merged
xiaoxmeng merged 1 commit intoprestodb:masterfrom Dec 5, 2023
Merged
[Native] Don't propagate errors in getResults#21481xiaoxmeng merged 1 commit intoprestodb:masterfrom
xiaoxmeng merged 1 commit intoprestodb:masterfrom
Conversation
e5fb90a to
5d55107
Compare
mbasmanova
reviewed
Dec 4, 2023
Contributor
mbasmanova
left a comment
There was a problem hiding this comment.
@kevinwilfong Thank you for tracking down this issue. It would be nice to update the documentation at https://prestodb.io/docs/current/develop/worker-protocol.html
mbasmanova
reviewed
Dec 4, 2023
Contributor
mbasmanova
left a comment
There was a problem hiding this comment.
perhaps, document this behavior in the comments for TaskManager::getResults API.
Today, the native TaskManager will throw an exception if getResults is called on a failed or aborted task. In Presto Java, it looks like the Worker will return empty results rather than an error in these cases. Looking at ClientBuffer in the Java code, it creates a future for getResults that will be fulfilled when the Task enqueues results. If the Task does not enqueue results, e.g. because it's failed or aborted, the future is fulfilled with empty results, either when getResults is called again or when the ClientBuffer is destroyed. This behavior makes sense to me. By propagating the exception, there's a race condition in the Cooridinator resulting in the actual exception that caused the Task to fail, depending on whether the producer or consumer Task is marked as failed first in the Coordinator. If the Task is failed/aborted then the Coordinator should eventually realize this through the getTaskStatus call. Put another way, the purpose of getResults is to propagate data, and the purpose of getTaskStatus is to propagate errors.
5d55107 to
2bb472d
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 76ae3ed...2bb472d.
|
steveburnett
approved these changes
Dec 5, 2023
Contributor
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Local build of docs, looks good. Thanks!
64 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today, the native TaskManager will throw an exception if getResults is called on a failed or aborted task. In Presto Java, it looks like the Worker will return empty results rather than an error in these cases.
Looking at ClientBuffer in the Java code, it creates a future for getResults that will be fulfilled when the Task enqueues results. If the Task does not enqueue results, e.g. because it's failed or aborted, the future is fulfilled with empty results, either when getResults is called again or when the ClientBuffer is destroyed.
This behavior makes sense to me. By propagating the exception, there's a race condition in the Cooridinator resulting in the actual exception that caused the Task to fail, depending on whether the producer or consumer Task is marked as failed first in the Coordinator. If the Task is failed/aborted then the Coordinator should eventually realize this through the getTaskStatus call. Put another way, the purpose of getResults is to propagate data, and the purpose of getTaskStatus is to propagate errors.