Skip to content

Fix flaky ThreadPoolMergeSchedulerStressTestIT#139819

Merged
burqen merged 5 commits intoelastic:mainfrom
burqen:ap/2025.12.19.issue-137161-ThreadPoolMergeSchedulerStressTestIT
Jan 8, 2026
Merged

Fix flaky ThreadPoolMergeSchedulerStressTestIT#139819
burqen merged 5 commits intoelastic:mainfrom
burqen:ap/2025.12.19.issue-137161-ThreadPoolMergeSchedulerStressTestIT

Conversation

@burqen
Copy link
Contributor

@burqen burqen commented Dec 19, 2025

Increase wait time for flaky test
Increasing maxWaitTime for merges to start and complete to 10 minutes. The previous time out of 1 minute leave room for scheduling variability to tip the test over the wait threshold and fail the test. If test still fails on 10 minute timeout we can be quite sure that it hangs.

Bind max number generated for node.processors setting to available processors.

Take care of issue #137161

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.
Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
@burqen burqen added the :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. label Dec 19, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. v9.4.0 labels Dec 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@burqen
Copy link
Contributor Author

burqen commented Dec 19, 2025

The original test failures come from 9.1 so the fix need to be back ported. I will read up on how to do that.

@burqen burqen added >test Issues or PRs that are addressing/adding tests auto-backport Automatically create backport pull requests when merged branch:9.2 branch:9.1 branch:8.19 labels Dec 19, 2025
Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

…-137161-ThreadPoolMergeSchedulerStressTestIT
@burqen
Copy link
Contributor Author

burqen commented Jan 7, 2026

Failing due to other flaky test that is fixed here #140271 . Will hold of and wait for that one to be merged.

…-137161-ThreadPoolMergeSchedulerStressTestIT
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.1
8.19
9.2

burqen added a commit to burqen/elasticsearch that referenced this pull request Jan 8, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
burqen added a commit to burqen/elasticsearch that referenced this pull request Jan 8, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
burqen added a commit to burqen/elasticsearch that referenced this pull request Jan 8, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
elasticsearchmachine pushed a commit that referenced this pull request Jan 8, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
elasticsearchmachine pushed a commit that referenced this pull request Jan 8, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
elasticsearchmachine pushed a commit that referenced this pull request Jan 9, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
elasticsearchmachine pushed a commit that referenced this pull request Jan 9, 2026
* Increase wait time for flaky test

Increasing maxWaitTime for merges to start and complete to 10 minutes.
The previous time out of 1 minute leave room for scheduling variability
to tip the test over the wait threshold and fail the test. If test
stills fails on 10 minute timeout we can be quite sure that it hangs.

* Bind node processors setting in test

Some CI environments only have 4 cores and will fail this test if rnd
gods are grumpy and generate a 5 or higher. NODE_PROCESSORS_SETTING is
limited to the number of available processors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. >test Issues or PRs that are addressing/adding tests v8.19.11 v9.1.11 v9.2.5 v9.3.1 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments