Skip to content

Fix flaky tests in GoogleCloudStorageBlobStoreRepositoryTests, S3BlobStoreRepositoryTests, AzureBlobStoreRepositoryTests#18290

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
kkewwei:fix_14299
May 14, 2025
Merged

Fix flaky tests in GoogleCloudStorageBlobStoreRepositoryTests, S3BlobStoreRepositoryTests, AzureBlobStoreRepositoryTests#18290
andrross merged 1 commit intoopensearch-project:mainfrom
kkewwei:fix_14299

Conversation

@kkewwei
Copy link
Contributor

@kkewwei kkewwei commented May 14, 2025

Description

I run test with tests.seed=9D496123288AF73F in S3BlobStoreRepositoryTests. testSnapshotAndRestore , and find that every request will retry 3 times, and sleep with backoff policy, it will cost too much time, which will lead to flaky tests.

[2025-05-14T16:15:21,549][DEBUG][s.a.a.request            ] [node_t0] Retryable error detected. Will retry in 52ms. Request attempt number 1
......
[2025-05-14T16:15:21,647][DEBUG][s.a.a.request            ] [node_t0] Retryable error detected. Will retry in 145ms. Request attempt number 2

Request is as follows:
GET /bucket?list-type=2&delimiter=%2F&prefix=index-
PUT /bucket/r10011100011010/indices/Hfd8LcIaQmu_nAtfzC7YJg/5/__Qf9SKHinSJWqsxDYO2Qrpw

Related Issues

Resolves #14291 #14299 #11493

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kkewwei kkewwei requested a review from a team as a code owner May 14, 2025 13:22
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run Storage:Snapshots labels May 14, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
@kkewwei
Copy link
Contributor Author

kkewwei commented May 14, 2025

@andrross @reta You may be interested. Please have a look in your spare time.

@github-actions
Copy link
Contributor

❌ Gradle check result for 8c42ae9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross
Copy link
Member

This is great @kkewwei, thank you!

@github-actions
Copy link
Contributor

✅ Gradle check result for 8c42ae9: SUCCESS

@codecov
Copy link

codecov bot commented May 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.60%. Comparing base (998ae73) to head (8c42ae9).
Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18290      +/-   ##
============================================
+ Coverage     72.48%   72.60%   +0.11%     
- Complexity    67357    67432      +75     
============================================
  Files          5488     5488              
  Lines        311023   311023              
  Branches      45217    45217              
============================================
+ Hits         225444   225809     +365     
+ Misses        67282    66829     -453     
- Partials      18297    18385      +88     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross
Copy link
Member

A couple thoughts here for posterity:

  • I hardcoded a 25% failure rate and S3BlobStoreRepositoryTests passed in about 30 seconds. Previously with the "always retry 3 times" setting the test would run from 20+ minutes
  • 25% is still a very high failure rate and should adequately test that the repositories can handle transient failures
  • I suspect the default retry policies of the object store clients have changed over time as those clients are upgraded and that this test wasn't always problematic

@andrross andrross merged commit bde7db5 into opensearch-project:main May 14, 2025
43 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Storage Project Board May 14, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 14, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests (#18290)

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
(cherry picked from commit bde7db5)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request May 14, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests (#18290) (#18298)

(cherry picked from commit bde7db5)

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@kkewwei kkewwei deleted the fix_14299 branch May 14, 2025 23:07
tanik98 pushed a commit to tanik98/OpenSearch that referenced this pull request May 27, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests (opensearch-project#18290)

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Jun 1, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests (opensearch-project#18290)

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests (opensearch-project#18290)

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>Signed-off-by: TJ Neuenfeldt <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
…StoreRepositoryTests, AzureBlobStoreRepositoryTests (opensearch-project#18290)

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut backport 2.19 flaky-test Random test failure that succeeds on second run skip-changelog Storage:Snapshots >test-failure Test failure from CI, local build, etc.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for AzureBlobStoreRepositoryTests

2 participants