Skip to content

Fix possible race condition in Worker SqlTaskManager#19683

Merged
dain merged 1 commit intotrinodb:masterfrom
jklamer:jklamer/FixRaceInTaskCatalogInit
Nov 13, 2023
Merged

Fix possible race condition in Worker SqlTaskManager#19683
dain merged 1 commit intotrinodb:masterfrom
jklamer:jklamer/FixRaceInTaskCatalogInit

Conversation

@jklamer
Copy link
Member

@jklamer jklamer commented Nov 9, 2023

There is a possible race condition between SqlTaskManager and
CatalogPruneTask on workers. Before the catalogs for a Task are set it
is possible for a active catalogs from the prune task to arrive, fail to be agumentted by the catalogs of the task, but then prune the catalogs after they are set and loaded. This fix uses a ReadWriteLock to ensure mutual exclusion of task catalog setting and task scanning/pruning.

The included test fails with this in the stack trace without the Read Lock

Caused by: java.lang.IllegalArgumentException: No catalog 'catalog_grvijxxnvr'
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:218)
	at io.trino.connector.WorkerDynamicCatalogManager.getConnectorServices(WorkerDynamicCatalogManager.java:160)
	at io.trino.execution.TestSqlTaskManagerRaceWithCatalogPrune.lambda$testMultipleTaskUpdatesWithMultipleCatalogPrunes$0(TestSqlTaskManagerRaceWithCatalogPrune.java:179)
	at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:71)
	... 10 more

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 9, 2023
@jklamer jklamer requested a review from dain November 9, 2023 00:19
@jklamer jklamer force-pushed the jklamer/FixRaceInTaskCatalogInit branch 5 times, most recently from e6b325a to ec236fd Compare November 9, 2023 23:48
There is a possible race condition between SqlTaskManager and
 CatalogPruneTask on workers. Before the catalogs for a Task are set it
is possible for a active catalogs from the prune task to arrive, fail to
be agumentted by the catalogs of the task, but then prune the catalogs
after they are set and loaded. This fix uses a ReadWriteLock to ensure
mutual exclusion of task catalog setting and task scanning/pruning.
@jklamer jklamer force-pushed the jklamer/FixRaceInTaskCatalogInit branch 2 times, most recently from 5871d28 to 8232754 Compare November 10, 2023 20:42
@dain dain merged commit 944aa96 into trinodb:master Nov 13, 2023
@github-actions github-actions bot added this to the 434 milestone Nov 13, 2023
@jklamer jklamer deleted the jklamer/FixRaceInTaskCatalogInit branch November 13, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants