Skip to content

Comments

Fix deadlock in ThreadPoolMergeScheduler when a failing merge closes the IndexWriter#134128

Closed
tlrx wants to merge 7 commits intoelastic:mainfrom
tlrx:2025/09/04/ES-12664
Closed

Fix deadlock in ThreadPoolMergeScheduler when a failing merge closes the IndexWriter#134128
tlrx wants to merge 7 commits intoelastic:mainfrom
tlrx:2025/09/04/ES-12664

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Sep 4, 2025

A merge that throws an exception causes the closing of the IndexWriter, which in turn aborts running merges and closes the ThreadPoolMergeScheduler in the same merge thread.

Before this change, ThreadPoolMergeScheduler#close would use a CountDownLatch to wait for the signal that all merges have been aborted/completed. But closing of the merge scheduler is executed in a merge thread that is not yet completed at the time it waits on the latch, causing a deadlock.

The proposed fix in this change uses a mechanism similar to what ConcurrentMergeScheduler#sync does, ie waits on all merge threads to be aborted/completed except the current one.

The proposed test works when ThreadPoolMergeScheduler is enabled or not. I'd like to add a similar test in serverless too, just to be sure it works everywhere.

Relates ES-12664

@tlrx tlrx added >bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v9.2.0 v9.1.4 labels Sep 4, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Sep 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine
Copy link
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@tlrx tlrx marked this pull request as draft September 4, 2025 13:06
@tlrx
Copy link
Member Author

tlrx commented Sep 22, 2025

Closed in favor of #134656

@tlrx tlrx closed this Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. v9.1.5 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants