Skip to content

[Native][Issue-3593] Handle Task::updateBroadcastOutputBuffer failure gracefully with error log message#18939

Merged
aditi-pandit merged 1 commit intoprestodb:masterfrom
karteekmurthys:task-manager-race-condition
Feb 17, 2023
Merged

[Native][Issue-3593] Handle Task::updateBroadcastOutputBuffer failure gracefully with error log message#18939
aditi-pandit merged 1 commit intoprestodb:masterfrom
karteekmurthys:task-manager-race-condition

Conversation

@karteekmurthys
Copy link
Contributor

@karteekmurthys karteekmurthys commented Jan 18, 2023

The TaskManager may enter a race condition on Task::updateBrodacastBuffers() leading to an exception:"Output buffers not found" if the PartitionedOutputBuffers for that task are previously deleted by a downstream driver.

This PR handles this path gracefully by checking the error condition in a new PartitionedOutputBuffersManager::getBufferIfExists() API instead of getBuffer() call (which throws that exception). An error message is also logged in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this commented code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the condition BufferType::BROADCAST removed ? That doesn't seem right to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to return and test a boolean for this method instead ? Returning nullptr for a successful update vs TaskInfo for a failure seems weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you always calling updateBroadcastOutputBuffers ? It seems more readable to check the BROADCAST condition here than in the function called ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like you are using prestoTask in this function. Why are you passing it ?

@karteekmurthys karteekmurthys force-pushed the task-manager-race-condition branch 3 times, most recently from bd1871e to 43e3d59 Compare January 21, 2023 09:24
@v-jizhang
Copy link
Contributor

@prestobot kick off tests

@karteekmurthys
Copy link
Contributor Author

@mbasmanova @aditi-pandit would you please review this.

1 similar comment
@karteekmurthys
Copy link
Contributor Author

@mbasmanova @aditi-pandit would you please review this.

@karteekmurthys karteekmurthys force-pushed the task-manager-race-condition branch from 797ca02 to d4c0e18 Compare February 7, 2023 22:07
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys The change looks good to me. Thanks. Please, update commit message to provide more details about this change.

No testing required as we are not taking any action for error case.

Not sure if this is a good reason for not adding a test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

@karteekmurthys karteekmurthys force-pushed the task-manager-race-condition branch 2 times, most recently from 8003838 to b1a295b Compare February 7, 2023 23:02
@karteekmurthys karteekmurthys changed the title [Native][Issue-3593] Fix possible race condition in TaskManager [Native][Issue-3593] Log message in TaskManager on Task::updateBroadcastOutputBuffer failures Feb 7, 2023
@aditi-pandit aditi-pandit changed the title [Native][Issue-3593] Log message in TaskManager on Task::updateBroadcastOutputBuffer failures [Native][Issue-3593] Handle Task::updateBroadcastOutputBuffer failure gracefully with error log message Feb 17, 2023
@aditi-pandit aditi-pandit merged commit 3c7e006 into prestodb:master Feb 17, 2023
@aditi-pandit
Copy link
Contributor

@karteekmurthys The change looks good to me. Thanks. Please, update commit message to provide more details about this change.

No testing required as we are not taking any action for error case.

Not sure if this is a good reason for not adding a test.

+1. Your change has an effect on the behavior. It would be good to mock and check that you caught the error. Please try to address that in another PR.

@wanglinsong wanglinsong mentioned this pull request Feb 25, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants