-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Attempt to fix ThreadLeakControl issues in S3BlobContainerRetriesTests #18201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to fix ThreadLeakControl issues in S3BlobContainerRetriesTests #18201
Conversation
|
❌ Gradle check result for 9f2e077: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Thanks for looking into this @nomoa! I shared some findings here: #17551 (comment). It's true that ExecutorService.shutdown() won't ensure all threads are stopped when it returns, but the leak control logic seems to wait 5 seconds, then forcibly interrupt the running threads, waits 3 more seconds, and only then fails the test. I think if the client had actually been closed, then waiting/interrupting would have succeeded in stopping the threads. The only way I can explain this behavior is if we're somehow leaking a client and not closing it. What do you think? Am I missing something here? Also I'm going to close #17540 in favor of #17551 so that we don't have duplicate issues. |
|
@andrross thanks for taking a look at this PR! Collection<Thread> threads = Collections.synchronizedCollection(new ArrayList<>());
ThreadFactory factory = new ThreadFactory() {
@Override
public Thread newThread(Runnable runnable) {
Thread thread = new Thread(runnable);
threads.add(thread);
return thread;
}
};
ScheduledExecutorService service = Executors.newScheduledThreadPool(1, factory);
service.schedule(() -> {}, 10L, TimeUnit.SECONDS);
service.shutdown();
while (threads.isEmpty()) {
Thread.sleep(10);
}
System.out.println("Thread started");
while (true) {
boolean oneAlive = false;
for (Thread t: threads) {
System.out.println("Interrupting thread " + t.getName() + " is alive: " + t.isAlive());
t.interrupt();
t.join(100);
oneAlive |= t.isAlive();
}
if (!oneAlive) {
break;
}
}
System.out.println("10 seconds later");So relying on My initial assumption (the one implemented in this PR) is that the default SDK client simply runs On further look while still theoretically possible I'm not sure that's what happening, if that's the case this PR would have to be adapted to do the same on The other possibility as you said is that a client simply gets leaked and its executor service is never shutdown. Possible reasons I can think of could be:
|
Ah, I think you might be right. I thought that |
9f2e077 to
77df22a
Compare
|
The last version of this PR is applying the same strategy for both S3Service & S3AsyncService based on the assumption that the leaked threads are coming from the clients own ScheduledExecutorService that is not able to shutdown in due time because some scheduled tasks are still waiting to be run/cancelled. |
|
❌ Gradle check result for 77df22a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
77df22a to
7171696
Compare
.../repository-s3/src/test/java/org/opensearch/repositories/s3/S3BlobContainerRetriesTests.java
Show resolved
Hide resolved
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3AsyncService.java
Outdated
Show resolved
Hide resolved
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #18201 +/- ##
============================================
+ Coverage 72.56% 72.58% +0.01%
- Complexity 67261 67353 +92
============================================
Files 5476 5482 +6
Lines 310478 310684 +206
Branches 45133 45157 +24
============================================
+ Hits 225313 225506 +193
- Misses 66840 66850 +10
- Partials 18325 18328 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The Default aws client might create its own ScheduledExecutorService which is closed via ExecutorService.shutdown() when the client is closed. Calling shutdown might not ensure that all threads are gone when ThreadLeakControl is ran. Provide our own ScheduledExecutorService during this test so that it runs thrown our tearDown method which calls ThreadPool#terminate waiting for 5 seconds. Signed-off-by: David Causse <[email protected]>
nomoa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review!
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3AsyncService.java
Outdated
Show resolved
Hide resolved
.../repository-s3/src/test/java/org/opensearch/repositories/s3/S3BlobContainerRetriesTests.java
Show resolved
Hide resolved
7171696 to
d7dd4b0
Compare
Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#17551 Signed-off-by: David Causse <[email protected]>
Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#14299 Signed-off-by: David Causse <[email protected]>
Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#14299 Signed-off-by: David Causse <[email protected]>
Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#14299 Signed-off-by: David Causse <[email protected]>
…#18317) Similar to #18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes #14299 Signed-off-by: David Causse <[email protected]>
…opensearch-project#18317) Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#14299 Signed-off-by: David Causse <[email protected]>
opensearch-project#18201) The Default aws client might create its own ScheduledExecutorService which is closed via ExecutorService.shutdown() when the client is closed. Calling shutdown might not ensure that all threads are gone when ThreadLeakControl is ran. Provide our own ScheduledExecutorService during this test so that it runs thrown our tearDown method which calls ThreadPool#terminate waiting for 5 seconds. Signed-off-by: David Causse <[email protected]>Signed-off-by: TJ Neuenfeldt <[email protected]>
…opensearch-project#18317) Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#14299 Signed-off-by: David Causse <[email protected]>Signed-off-by: TJ Neuenfeldt <[email protected]>
opensearch-project#18201) The Default aws client might create its own ScheduledExecutorService which is closed via ExecutorService.shutdown() when the client is closed. Calling shutdown might not ensure that all threads are gone when ThreadLeakControl is ran. Provide our own ScheduledExecutorService during this test so that it runs thrown our tearDown method which calls ThreadPool#terminate waiting for 5 seconds. Signed-off-by: David Causse <[email protected]>
…opensearch-project#18317) Similar to opensearch-project#18201 but applied via the TestPlugin used in S3BlobStoreRepositoryTests. Closes opensearch-project#14299 Signed-off-by: David Causse <[email protected]>
The Default aws client might create its own ScheduledExecutorService which is closed via ExecutorService.shutdown() when the client is closed. Calling shutdown might not ensure that all threads are gone when ThreadLeakControl is ran.
Provide our own ScheduledExecutorService during this test so that it runs thrown our tearDown method which calls ThreadPool#terminate waiting for 5 seconds.
Description
Seen in build https://build.ci.opensearch.org/job/gradle-check/57676/:
I believe the only thread remaining from a ScheduledThreadPoolExecutor should be the ones created by the default s3 client.
Related Issues
Resolves #17551
Check List
[ ] Functionality includes testing.[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.