Fix race conditions around Dynamic Catalog pruning #28019
Fix race conditions around Dynamic Catalog pruning #28019findepi merged 3 commits intotrinodb:masterfrom
Conversation
|
Preparatory commits extracted to: So that this PR becomes smaller and easier to review. |
f577460 to
fa1834b
Compare
4d8d0b3 to
a2037ab
Compare
a2037ab to
0a1afd0
Compare
There was a problem hiding this comment.
Pull request overview
This pull request fixes three critical race conditions in dynamic catalog pruning that could cause queries to fail when catalogs are dropped concurrently with metadata listing or transaction operations.
Changes:
- Introduces a "prunable state" concept to capture snapshots of catalogs that can be pruned, preventing race conditions where newly created catalogs could be incorrectly removed
- Changes transaction manager to report registered catalogs instead of only active catalogs, preventing catalogs from being prematurely pruned while transactions are still using them
- Removes the ReadWriteLock from SqlTaskManager as the prunable state pattern eliminates the need for mutual exclusion between catalog operations
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| core/trino-main/src/main/java/io/trino/connector/ConnectorServicesProvider.java | Adds PrunableState record and updates pruneCatalogs method signature to accept prunable state |
| core/trino-main/src/main/java/io/trino/connector/CatalogPruneTask.java | Updates pruning logic to obtain prunable state before collecting active catalogs and pass it to pruneCatalogs |
| core/trino-main/src/main/java/io/trino/connector/CoordinatorDynamicCatalogManager.java | Implements getPrunableState and updates pruneCatalogs to check prunable state before removing catalogs |
| core/trino-main/src/main/java/io/trino/connector/WorkerDynamicCatalogManager.java | Implements getPrunableState and updates pruneCatalogs to check prunable state before removing catalogs |
| core/trino-main/src/main/java/io/trino/connector/StaticCatalogManager.java | Updates to implement new interface methods (returns empty set for prunable state) |
| core/trino-main/src/main/java/io/trino/metadata/CatalogManager.java | Renames getActiveCatalogs to getReachableDynamicCatalogs for clarity |
| core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java | Removes ReadWriteLock and updates to use prunable state pattern |
| core/trino-main/src/main/java/io/trino/transaction/InMemoryTransactionManager.java | Changes to report registered catalogs instead of only active catalogs |
| core/trino-main/src/main/java/io/trino/transaction/TransactionInfo.java | Renames activeCatalogs field to registeredCatalogs to reflect semantic change |
| Test files | Updates all mock implementations to implement new interface methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
core/trino-main/src/main/java/io/trino/connector/CoordinatorDynamicCatalogManager.java
Show resolved
Hide resolved
lukasz-stec
left a comment
There was a problem hiding this comment.
lgtm, I have just a couple of nit comments. Another nit is that it would make reviewing, including once it is merged, easier if the fixes for specific issues were split into separate commits.
core/trino-main/src/main/java/io/trino/connector/WorkerDynamicCatalogManager.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * Returns prunable state to be passed to {@link #pruneCatalogs}. | ||
| */ | ||
| PrunableState getPrunableState(); |
There was a problem hiding this comment.
nit: it feels too soon to add an abstraction like PrunableState if we only need a snapshot of all catalogs
There was a problem hiding this comment.
i added it not to confuse with Set<CatalogHandle> inUse parameter. Also, i will change the abstraction now a little bit
I was afraid of the reverse, but sure, i can split. |
|
One problem, at least theoretically, is that prunable state contains I will change PrunableState to hold "registration id" to make sure the prunable catalog was registered for the whole time between prunable state was acquired and prune was called. This will make it "a read lock emulator" it was supposed to be, with correctness depending on fewer assumptions, and thus more likely to be true. I will also split commits per @lukasz-stec suggestion. |
I decided not to. The fix to (1) isn't whole without the rest. Let's not split hair on this. |
0a1afd0 to
cbffcc4
Compare
Before changes catalog pruning had couple of problems.
I. Race between metadata listing (`system.jdbc` or `system.metadata`
views) and `DROP CATALOG`.
Following interleaving was possible:
1. transaction is inspected, doesn't use a catalog yet, so it's not in
`transactionManager.getAllTransactionInfos()`
2. transaction registers the catalog for use
3. `DROP CATALOG` deregister catalog from its name
4. so the catalog is not among `catalogManager.getActiveCatalogs()`
5. `CoordinatorDynamicCatalogManager.pruneCatalogs` removes it, breaking
the ongoing transaction
This is fixed by changing `CatalogPruneTask.getActiveCatalogs` to
consult catalogs reachable in catalog manager first, before checking
transaction.
II. Race between `DROP CATALOG` and catalog showing up as in use in
`transactionManager.getAllTransactionInfos()`.
`InMemoryTransactionManager.TransactionMetadata` maintains two
collections: `registeredCatalogs` and `activeCatalogs`. Only a catalog
reachable in catalog manager can be become registered. Once it's
registered, it can be come active. `getAllTransactionInfos()` reported
on active catalogs, not those registered, thus `TransactionMetadata`
allowed to "resurrect" a catalog considered dropped and not in use.
This is fixed by reporting registered catalogs from
`transactionManager.getAllTransactionInfos()`, not only those active.
III. Race between { catalog pruning } and { `CREATE CATALOG` followed by
concurrent catalog use and `DROP CATALOG` }.
Following interleaving was possible:
1. `CatalogPruneTask.getActiveCatalogs` obtains some set of active
catalogs.
2. `CREATE CATALOG` is invoked, new catalog is created
3. A transaction starts to use the new catalog.
4. `DROP CATALOG` is invoked, the new catalog is no longer reachable.
5. `CatalogPruneTask` calls
`CoordinatorDynamicCatalogManager.pruneCatalogs` removing the catalog
(it's neither reachable, nor among active catalogs).
6. The ongoing transaction fails, as the catalog its using is gone.
This is fixed by a concept of a "prunable state". Catalog pruning
(`CatalogPruneTask` and `SqlTaskManager.pruneCatalogs`) is now
sequenced:
1. obtain prunable state (all currently known catalogs)
2. obtain catalogs in use
3. prune catalogs non-referencible, not in use, but only those in
prunable state
Introduction of prunable state allows to rework solution from commit
944aa96. The logic in
`CatalogPruneTask` and `SqlTaskManager.pruneCatalogs` is now more
similar and `SqlTaskManager` does not need a supporting RW lock.
cbffcc4 to
f342d1c
Compare
This test did not deterministically reproduce the issue. For that, repeat count would need to be dialed up. However, it should be efficient to act as a regression test on CI, given CI runs tests over and over again.
f342d1c to
36107c5
Compare
Before changes catalog pruning had couple of problems.
Race between metadata listing (
system.jdbcorsystem.metadataviews) andDROP CATALOG.Following interleaving was possible:
transactionManager.getAllTransactionInfos()DROP CATALOGderegister catalog from its namecatalogManager.getActiveCatalogs()CoordinatorDynamicCatalogManager.pruneCatalogsremoves it, breakingthe ongoing transaction
This is fixed by changing
CatalogPruneTask.getActiveCatalogstoconsult catalogs reachable in catalog manager first, before checking
transaction.
Race between
DROP CATALOGand catalog showing up as in use intransactionManager.getAllTransactionInfos().InMemoryTransactionManager.TransactionMetadatamaintains twocollections:
registeredCatalogsandactiveCatalogs. Only a catalogreachable in catalog manager can be become registered. Once it's
registered, it can be come active.
getAllTransactionInfos()reportedon active catalogs, not those registered, thus
TransactionMetadataallowed to "resurrect" a catalog considered dropped and not in use.
This is fixed by reporting registered catalogs from
transactionManager.getAllTransactionInfos(), not only those active.Race between { catalog pruning } and {
CREATE CATALOGfollowed by concurrent catalog use andDROP CATALOG}.Following interleaving was possible:
CatalogPruneTask.getActiveCatalogsobtains some set of activecatalogs.
CREATE CATALOGis invoked, new catalog is createdDROP CATALOGis invoked, the new catalog is no longer reachable.CatalogPruneTaskcallsCoordinatorDynamicCatalogManager.pruneCatalogsremoving the catalog(it's neither reachable, nor among active catalogs).
This is fixed by a concept of a "prunable state". Catalog pruning
(
CatalogPruneTaskandSqlTaskManager.pruneCatalogs) is nowsequenced:
prunable state
Introduction of prunable state allows to rework solution from commit
944aa96 (#19683). The logic in
CatalogPruneTaskandSqlTaskManager.pruneCatalogsis now moresimilar and
SqlTaskManagerdoes not need a supporting RW lock.Release notes