Skip to content

Conversation

@Gautam-aman
Copy link

@Gautam-aman Gautam-aman commented Dec 5, 2025

Description

Stabilizes :plugins:repository-azure:internalClusterTest, which occasionally failed
in CI before executing any JUnit tests due to the Azure docker fixture not being
fully ready when the test suite began.

Fix

  • Added assertBusy() + ensureGreen() to wait for Azure fixture readiness
  • Added test teardown to remove all indices and repositories and prevent stale CI state
  • No production behavior has been modified — test-only reliability fix

Testing

  • 20 consecutive successful local runs:
    for i in {1..20}; do ./gradlew :plugins:repository-azure:internalClusterTest; done

Issue

Fixes #20124

Summary by CodeRabbit

  • Tests
    • Improved Azure integration test reliability by adding a bounded readiness check before tests, a stronger teardown that fully cleans indices and repositories after each run, and a reusable retry helper to wait for transient conditions.

✏️ Tip: You can customize this high-level summary in your review settings.

…r Azure fixture readiness and cleaning stale state
@Gautam-aman Gautam-aman requested a review from a team as a code owner December 5, 2025 13:34
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Dec 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Walkthrough

Adds test lifecycle helpers to Azure storage cleanup tests: a new @Before method waits for the Azure fixture and required system properties, an overridden tearDown deletes all indices and repositories before delegating to super, and a private eventually(Supplier) helper using assertBusy was introduced.

Changes

Cohort / File(s) Summary
Azure Test Lifecycle & Helpers
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Added public void waitForAzureFixtureReady() annotated with @Before to wait for green cluster health and assert test.azure.container and test.azure.base (60s). Added @Override public void tearDown() throws Exception to delete all indices and all repositories before calling super.tearDown(). Added private <T> T eventually(Supplier<T> supplier) throws Exception helper that uses assertBusy (30s). Added imports for @Before, TimeUnit, assertBusy, and a Supplier usage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Single test file modified with small, focused additions.
  • Review focus:
    • Correctness and timeout values in waitForAzureFixtureReady() and eventually().
    • Safety and completeness of cleanup in tearDown() (deleting indices and repositories).
    • Proper use of assertBusy and ensuring super.tearDown() is invoked.

Poem

🐇 I waited by the pond until the fixtures shone,
Counted containers and bases, hopped until green was shown.
Then I swept the burrow, cleared each tiny thread—
Tucked the tests to bed, with carrots in my head. 🌙✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and covers the main sections: description of the issue, the fix implemented, testing performed, and the linked issue.
Linked Issues check ✅ Passed The changes directly address the flaky test failures reported in #20124 by adding fixture readiness checks and test cleanup logic to stabilize the Azure internal cluster tests.
Out of Scope Changes check ✅ Passed All changes are scoped to test reliability improvements; only the Azure storage cleanup test class was modified with fixture setup, teardown, and utility helpers—no production code changes.
Title check ✅ Passed The title directly and accurately summarizes the main changes: fixing flaky tests by waiting for Azure fixture readiness and cleaning stale state, which aligns with all modifications in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

❌ Gradle check result for 49ca72f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)

72-86: Fixture readiness check is solid; consider aligning with credentials() or documenting behavior

The @Before hook nicely encapsulates the flakiness fix by:

  • Ensuring the node is green before tests run.
  • Using assertBusy with a bounded timeout so test.azure.container / test.azure.base must become non‑blank within 60s or fail fast.

Two optional refinements to consider:

  • To keep the readiness condition exactly in sync with what the test later requires, you could either:
    • Also include test.azure.account and the key/SAS token checks here, or
    • Call credentials() inside the assertBusy block (if it has no side effects beyond validations), so the same invariants gate fixture readiness and repository creation.
  • A brief comment explaining that these system properties can transition from blank to non‑blank after JVM startup (due to the Gradle Azure fixture) would make this less “mysterious” to future readers.

These are nice‑to‑have; the current implementation already addresses the reported flakiness.


90-98: Double‑check interaction with superclass tearDown and robustness of wildcard deletes

The additional cleanup is directionally right (preventing state leakage across runs), but there are a couple of subtle points worth verifying:

  1. Deleting all repositories before super.tearDown()
    AbstractThirdPartyRepositoryTestCase defines its own tearDown() that historically:

    • Cleans up the blob store for the default "test-repo".
    • Deletes "test-repo" via the cluster admin client. (jar-download.com)

    By calling client().admin().cluster().prepareDeleteRepository("_all").get(); first, you may be removing "test-repo" before super.tearDown() runs, depending on the current framework implementation. If the superclass still assumes that repository exists (e.g., via getRepository()), super.tearDown() could start failing with RepositoryMissingException.

    To avoid that risk, consider one of:

    • Limiting deletion here to indices only and letting AbstractThirdPartyRepositoryTestCase.tearDown() remain the sole authority for repo cleanup.
    • Or, if you really need to clear all repositories, ensuring that the superclass no longer depends on "test-repo" being present (and documenting that assumption in a comment).
  2. Wildcard index delete edge cases
    prepareDelete("*") can behave differently depending on index options and whether only system/hidden indices are present. In some configurations this may throw if there are no matching indices or if deletes of system indices are forbidden, which would turn into teardown failures rather than just leftover state.

    If you want teardown to be “best‑effort” rather than fail the test suite when nothing is left to delete, you could catch and ignore the specific exceptions that represent “nothing to delete / forbidden system index” while still surfacing unexpected errors.

Given this is test‑only, these adjustments are about making the cleanup more future‑proof and less coupled to internal assumptions of the base class.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0449ce8 and 49ca72f.

📒 Files selected for processing (1)
  • plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (1)

34-36: Imports correctly reflect new setup/teardown behavior

The added imports for @Before, assertBusy, and TimeUnit match the new logic below and keep dependencies localized to test-only utilities. No issues here.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

❌ Gradle check result for 72ed735: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

❌ Gradle check result for da47b31: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@Gautam-aman
Copy link
Author

The failure appears unrelated to this change. The failing task is
:distribution:bwc:maintenance:buildBwcLinuxTar, not related to repository-azure.

All Azure tests passed, spotless and compilation passed.
Requesting a CI re-run when convenient. 👍

@sandeshkr419 sandeshkr419 changed the title Stabilize :plugins:repository-azure:internalClusterTest by waiting fo… Fix flay :plugins:repository-azure:internalClusterTest by waiting for Azure fixture readiness and cleaning stale state Dec 10, 2025
@sandeshkr419 sandeshkr419 changed the title Fix flay :plugins:repository-azure:internalClusterTest by waiting for Azure fixture readiness and cleaning stale state Fix flaky :plugins:repository-azure:internalClusterTest by waiting for Azure fixture readiness and cleaning stale state Dec 10, 2025
@sandeshkr419
Copy link
Member

Thanks @Gautam-aman for fixing this test. Can you ensure your commits are signed off correctly. This should give you context on how to: https://github.com/opensearch-project/OpenSearch/pull/20171/checks?check_run_id=57257172591

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for Gradle Test Run :plugins:repository-azure:internalClusterTest

2 participants