Schedule coordinator stage after creating distributed stages scheduler by losipiuk · Pull Request #10640 · trinodb/trino

losipiuk · 2022-01-17T10:26:52Z

If coordinator stage is running and fails while distributed stages scheduler
is being created we can get unexpected failures from the latter routine.
The race between handling failure of coordinator stage and code run from
createDistributedStagesScheduler may result in the failures of SPI
calls called by the latter; and propagating those failures as top-level
query failure.

E.g. for queries which read Hive we observed the exceptions from
HiveSplitManager.getSplits() to be set as top-level query failures after
coordinator stage failure, because currentQueryId was unset in
SemiTransactionalHiveMetastore as part of coordinator failure handling.

Defering start of coordinator stage does not solve the problem totally
but changes the probability of occurence.

If coordinator stage is running and fails while distributed stages scheduler is being created we can get unexpected failures from the latter routine. The race between handling failure of coordinator stage and code run from createDistributedStagesScheduler may result in the failures of SPI calls called by the latter; and propagating those failures as top-level query failure. E.g. for queries which read Hive we observed the exceptions from HiveSplitManager.getSplits() to be set as top-level query failures after coordinator stage failure, because `currentQueryId` was unset in SemiTransactionalHiveMetastore as part of coordinator failure handling. Defering start of coordinator stage does not solve the problem totally but changes the probability of occurence.

cla-bot bot added the cla-signed label Jan 17, 2022

losipiuk requested a review from arhimondr January 17, 2022 10:26

losipiuk mentioned this pull request Jan 17, 2022

Flaky TestHiveQueryFailureRecoveryTest(testInsertIntoExistingPartitionBucketed, testInsertIntoNewPartition, testInsertIntoNewPartitionBucketed, testReplaceExistingPartition) #10631

Closed

arhimondr approved these changes Jan 17, 2022

View reviewed changes

losipiuk merged commit 89ab4db into trinodb:master Jan 18, 2022

github-actions bot added this to the 369 milestone Jan 18, 2022

mosabua mentioned this pull request Jan 18, 2022

Add Trino 369 release notes #10553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schedule coordinator stage after creating distributed stages scheduler#10640

Schedule coordinator stage after creating distributed stages scheduler#10640
losipiuk merged 1 commit intotrinodb:masterfrom
losipiuk:lo/start-schedulers-concurrently

losipiuk commented Jan 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

losipiuk commented Jan 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants