Skip to content

Comments

fix bug of warm index : FullFileCachedIndexInput was repeatedly closed#20055

Merged
gbbafna merged 3 commits intoopensearch-project:mainfrom
liuyonghengheng:main_warm_index_dev
Dec 12, 2025
Merged

fix bug of warm index : FullFileCachedIndexInput was repeatedly closed#20055
gbbafna merged 3 commits intoopensearch-project:mainfrom
liuyonghengheng:main_warm_index_dev

Conversation

@liuyonghengheng
Copy link
Contributor

@liuyonghengheng liuyonghengheng commented Nov 19, 2025

Description

check closed value in function FullFileCachedIndexInput.IndexInputHolder.run(),prevent FileCachedIndexInput closed repeatedly and duplicate deref the reference count of cache entry

Related Issues

Resolves #20054 #20054

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Bug Fixes

    • Improved warm index caching lifecycle so cached index inputs and their clones are cleaned up reliably and closing is idempotent, preventing premature or repeated closure.
  • Tests

    • Added end-to-end lifecycle and GC-related tests to validate clone/slice closing and cleanup behavior.
  • Documentation

    • Added a changelog entry describing the warm index caching fix.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

❌ Gradle check result for dacd3a5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❕ Gradle check result for c3fc0f2: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.26%. Comparing base (1aed472) to head (033d8f2).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...x/store/remote/filecache/FileCachedIndexInput.java 66.66% 0 Missing and 1 partial ⚠️
...ore/remote/filecache/FullFileCachedIndexInput.java 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20055      +/-   ##
============================================
- Coverage     73.29%   73.26%   -0.03%     
+ Complexity    71780    71743      -37     
============================================
  Files          5795     5795              
  Lines        328297   328302       +5     
  Branches      47282    47283       +1     
============================================
- Hits         240612   240523      -89     
- Misses        68368    68431      +63     
- Partials      19317    19348      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

Walkthrough

Converted the closed flag from a volatile boolean to an AtomicBoolean across FileCachedIndexInput and FullFileCachedIndexInput, updated close/cleaner logic and IndexInputHolder to use the AtomicBoolean, added indexInputHolderRun() for explicit cleaner invocation, and added a test exercising clone/close/cleaner interactions. (50 words)

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Added a "Fixed" entry: "Fix bug of warm index: FullFileCachedIndexInput was closed error".
Closed flag atomicity
server/src/main/java/org/opensearch/index/store/remote/filecache/FileCachedIndexInput.java
Replaced protected volatile boolean closed with protected final AtomicBoolean closed = new AtomicBoolean(false) and updated reads/writes to atomic operations; added import for AtomicBoolean.
Lifecycle & cleaner coordination
server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java
Propagated AtomicBoolean into IndexInputHolder constructor, switched close() and cleaner Runnable to use closed.get()/closed.set(...), added guards to avoid duplicate closures, and added public void indexInputHolderRun() to trigger the cleaner-held runnable.
Tests
server/src/test/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInputTests.java
Added public void testClose() throws IOException to validate clone lifecycle, reference counting progression, explicit close behavior, and GC-like cleaner invocations via indexInputHolderRun().

Sequence Diagram

sequenceDiagram
    participant Test
    participant FullInput as FullFileCachedIndexInput
    participant Holder as IndexInputHolder
    participant Cleaner as CleanerRunnable
    participant Atomic as closed (AtomicBoolean)

    Note over Test,FullInput: Clone/close lifecycle with cleaner coordination
    Test->>FullInput: clone() x3
    FullInput->>Atomic: closed.get() -> false
    FullInput->>FullInput: increment refCount for clones

    Test->>FullInput: clone1.close()
    FullInput->>Atomic: closed.get() -> false
    FullInput->>Atomic: closed.set(true)
    FullInput->>Holder: schedule/hold cleanup
    Holder->>Cleaner: run()
    Cleaner->>Atomic: closed.get()
    alt not already cleaned
        Cleaner->>FullInput: deRef() (decrement refCount)
        Cleaner->>Atomic: closed.set(true)
    end

    Test->>FullInput: indexInputHolderRun() on clone1
    FullInput->>Holder: run() (explicit)
    Holder->>Atomic: closed.get() -> true
    Holder->>Cleaner: no-op if already cleaned

    Test->>FullInput: close remaining clones -> refCount -> 0
    Cleaner->>FullInput: final cleanup when refs reach 0
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify AtomicBoolean is used consistently for visibility and idempotence across all close paths.
  • Inspect Cleaner Runnable guard logic and interactions with indexInputHolderRun().
  • Check reference counting correctness when clones are closed vs cleaned by the cleaner.
  • Confirm the new indexInputHolderRun() method is intended and safe to expose (test usage).

Poem

🐇 A flag once fickle, now atomic and true,
Clones hop in, cleaners tidy the queue.
No double-close tumble, no racing fright,
Threads settle down in the soft moonlight,
Rabbit hums — everything’s snug tonight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: preventing repeated closure of FullFileCachedIndexInput by using AtomicBoolean for thread-safe state management.
Description check ✅ Passed The description covers the core fix and references the related issue (#20054), but lacks detail about implementation approach, test coverage, and reasoning.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c0f955 and 033d8f2.

📒 Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: Mend Security Check

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java (1)

101-106: Consider adding @VisibleForTesting annotation or reducing visibility.

This method is intended for test use only, but it's public. Consider adding an annotation to make this intent explicit:

+    import org.opensearch.common.annotation.VisibleForTesting;
...
     /**
-     * Run resource cleaning,To be used only in test
+     * Run resource cleaning. To be used only in tests.
      */
+    @VisibleForTesting
     public void indexInputHolderRun() {
         indexInputHolder.run();
     }

Alternatively, since the test class is in the same package, this could be made package-private.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d17990 and 7014a56.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/index/store/remote/filecache/FileCachedIndexInput.java (4 hunks)
  • server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java (5 hunks)
  • server/src/test/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInputTests.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (4)
CHANGELOG.md (1)

93-93: LGTM!

The changelog entry is correctly placed under the Fixed section with proper formatting and issue reference.

server/src/main/java/org/opensearch/index/store/remote/filecache/FileCachedIndexInput.java (1)

45-45: LGTM!

The change from volatile boolean to AtomicBoolean enables sharing the closed state with IndexInputHolder for coordinated closure, which is the core fix for preventing duplicate decRef calls. The final modifier appropriately ensures the reference cannot be reassigned.

Also applies to: 141-154

server/src/test/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInputTests.java (1)

81-113: Well-structured test covering the key bug scenarios.

This test effectively validates the fix by verifying:

  1. close() followed by indexInputHolderRun() doesn't double-decrement (lines 96-100)
  2. indexInputHolderRun() followed by close() doesn't double-decrement (lines 103-107)
  3. Normal close behavior decrements correctly (lines 109-112)

This directly validates the prevention of duplicate reference count decrements.

server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java (1)

41-42: Core fix looks correct.

The shared AtomicBoolean between FullFileCachedIndexInput and IndexInputHolder ensures coordinated closure - whichever executes first (close() or the cleaner's run()) will set closed=true, preventing the other from executing duplicate cleanup and decRef calls. This correctly addresses the repeated close bug.

Also applies to: 115-116, 126-130

@github-actions
Copy link
Contributor

✅ Gradle check result for 7014a56: SUCCESS

@rayshrey
Copy link
Contributor

rayshrey commented Dec 1, 2025

@gbbafna can you please review and merge this if it looks good

liuyonghengheng and others added 2 commits December 8, 2025 11:17
Signed-off-by: Yongheng Liu <liuyonghengheng@gmail.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the main_warm_index_dev branch from 9b97870 to 0c0f955 Compare December 8, 2025 19:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7014a56 and 0c0f955.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/index/store/remote/filecache/FileCachedIndexInput.java (4 hunks)
  • server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java (5 hunks)
  • server/src/test/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInputTests.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/test/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInputTests.java
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/index/store/remote/filecache/FileCachedIndexInput.java (2)

16-16: LGTM: AtomicBoolean field declaration.

The conversion from volatile boolean to AtomicBoolean provides the necessary coordination between explicit close calls and the cleaner-invoked cleanup path, preventing duplicate closure and double reference decrements.

Also applies to: 45-45


143-143: LGTM: Atomic closure guard.

The close method correctly uses closed.get() to check state and closed.set(true) to mark completion, ensuring proper visibility across threads.

Also applies to: 152-152

server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java (4)

21-21: LGTM: Shared closure state.

Passing the closed AtomicBoolean to IndexInputHolder enables proper coordination between explicit close and cleaner-triggered cleanup, preventing the double-close bug.

Also applies to: 41-41


87-87: LGTM: Coordinated closure with cleaner.

The close method now uses the shared closed AtomicBoolean, ensuring that either this explicit close or the cleaner's cleanup executes the decRef logic, but not both.

Also applies to: 97-97


109-109: LGTM: IndexInputHolder state coordination.

Storing the shared closed AtomicBoolean enables the cleaner runnable to coordinate with explicit close calls, correctly implementing the fix for duplicate closure.

Also applies to: 115-121


126-130: LGTM: Cleaner cleanup guard.

The guard correctly prevents duplicate cleanup when the object becomes phantom reachable, ensuring indexInput.close() and cache.decRef() execute at most once across both the cleaner and explicit close paths.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

❌ Gradle check result for 0c0f955: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-project-automation github-project-automation bot moved this to 👀 In review in Storage Project Board Dec 9, 2025
@gbbafna
Copy link
Contributor

gbbafna commented Dec 9, 2025

Changes LGTM . Thanks @liuyonghengheng for this and @rayshrey for the review. Will merge once the build passes .

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

✅ Gradle check result for 0c0f955: SUCCESS

Signed-off-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 033d8f2: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 033d8f2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 033d8f2: SUCCESS

@gbbafna gbbafna merged commit b09dcc9 into opensearch-project:main Dec 12, 2025
37 of 41 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Storage Project Board Dec 12, 2025
fdesu pushed a commit to fdesu/OpenSearch that referenced this pull request Dec 13, 2025
liuguoqingfz pushed a commit to liuguoqingfz/OpenSearch that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Storage Issues and PRs relating to data and metadata storage

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

[BUG] write warm index error because execption of index file Already closed

5 participants