-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix flaky :plugins:repository-azure:internalClusterTest by waiting for Azure fixture readiness and cleaning stale state #20171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…r Azure fixture readiness and cleaning stale state
WalkthroughAdds test lifecycle helpers to Azure storage cleanup tests: a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
❌ Gradle check result for 49ca72f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
72-86: Fixture readiness check is solid; consider aligning withcredentials()or documenting behaviorThe
@Beforehook nicely encapsulates the flakiness fix by:
- Ensuring the node is green before tests run.
- Using
assertBusywith a bounded timeout sotest.azure.container/test.azure.basemust become non‑blank within 60s or fail fast.Two optional refinements to consider:
- To keep the readiness condition exactly in sync with what the test later requires, you could either:
- Also include
test.azure.accountand the key/SAS token checks here, or- Call
credentials()inside theassertBusyblock (if it has no side effects beyond validations), so the same invariants gate fixture readiness and repository creation.- A brief comment explaining that these system properties can transition from blank to non‑blank after JVM startup (due to the Gradle Azure fixture) would make this less “mysterious” to future readers.
These are nice‑to‑have; the current implementation already addresses the reported flakiness.
90-98: Double‑check interaction with superclasstearDownand robustness of wildcard deletesThe additional cleanup is directionally right (preventing state leakage across runs), but there are a couple of subtle points worth verifying:
Deleting all repositories before
super.tearDown()
AbstractThirdPartyRepositoryTestCasedefines its owntearDown()that historically:
- Cleans up the blob store for the default
"test-repo".- Deletes
"test-repo"via the cluster admin client. (jar-download.com)By calling
client().admin().cluster().prepareDeleteRepository("_all").get();first, you may be removing"test-repo"beforesuper.tearDown()runs, depending on the current framework implementation. If the superclass still assumes that repository exists (e.g., viagetRepository()),super.tearDown()could start failing withRepositoryMissingException.To avoid that risk, consider one of:
- Limiting deletion here to indices only and letting
AbstractThirdPartyRepositoryTestCase.tearDown()remain the sole authority for repo cleanup.- Or, if you really need to clear all repositories, ensuring that the superclass no longer depends on
"test-repo"being present (and documenting that assumption in a comment).Wildcard index delete edge cases
prepareDelete("*")can behave differently depending on index options and whether only system/hidden indices are present. In some configurations this may throw if there are no matching indices or if deletes of system indices are forbidden, which would turn into teardown failures rather than just leftover state.If you want teardown to be “best‑effort” rather than fail the test suite when nothing is left to delete, you could catch and ignore the specific exceptions that represent “nothing to delete / forbidden system index” while still surfacing unexpected errors.
Given this is test‑only, these adjustments are about making the cleanup more future‑proof and less coupled to internal assumptions of the base class.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (1)
34-36: Imports correctly reflect new setup/teardown behaviorThe added imports for
@Before,assertBusy, andTimeUnitmatch the new logic below and keep dependencies localized to test-only utilities. No issues here.
|
❌ Gradle check result for 72ed735: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for da47b31: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
The failure appears unrelated to this change. The failing task is All Azure tests passed, spotless and compilation passed. |
|
Thanks @Gautam-aman for fixing this test. Can you ensure your commits are signed off correctly. This should give you context on how to: https://github.com/opensearch-project/OpenSearch/pull/20171/checks?check_run_id=57257172591 |
Description
Stabilizes
:plugins:repository-azure:internalClusterTest, which occasionally failedin CI before executing any JUnit tests due to the Azure docker fixture not being
fully ready when the test suite began.
Fix
assertBusy()+ensureGreen()to wait for Azure fixture readinessTesting
for i in {1..20}; do ./gradlew :plugins:repository-azure:internalClusterTest; doneIssue
Fixes #20124
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.