Schedule coordinator stage after creating distributed stages scheduler#10640
Merged
losipiuk merged 1 commit intotrinodb:masterfrom Jan 18, 2022
Merged
Conversation
If coordinator stage is running and fails while distributed stages scheduler is being created we can get unexpected failures from the latter routine. The race between handling failure of coordinator stage and code run from createDistributedStagesScheduler may result in the failures of SPI calls called by the latter; and propagating those failures as top-level query failure. E.g. for queries which read Hive we observed the exceptions from HiveSplitManager.getSplits() to be set as top-level query failures after coordinator stage failure, because `currentQueryId` was unset in SemiTransactionalHiveMetastore as part of coordinator failure handling. Defering start of coordinator stage does not solve the problem totally but changes the probability of occurence.
arhimondr
approved these changes
Jan 17, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If coordinator stage is running and fails while distributed stages scheduler
is being created we can get unexpected failures from the latter routine.
The race between handling failure of coordinator stage and code run from
createDistributedStagesScheduler may result in the failures of SPI
calls called by the latter; and propagating those failures as top-level
query failure.
E.g. for queries which read Hive we observed the exceptions from
HiveSplitManager.getSplits() to be set as top-level query failures after
coordinator stage failure, because
currentQueryIdwas unset inSemiTransactionalHiveMetastore as part of coordinator failure handling.
Defering start of coordinator stage does not solve the problem totally
but changes the probability of occurence.