Skip to content

Conversation

@sgup432
Copy link
Contributor

@sgup432 sgup432 commented Dec 1, 2025

Description

We encountered an issue where fieldDataCache dynamic setting was getting registered multiple times (link) causing settingUpdaters list to bloat up (link) and eventually causing memory issues on active cluster manager node.

This is a sample screenshot which shows multiple such settingUpdaters, all belonging to FieldDataCache and specifically indices.fielddata.cache.size setting.

Screenshot 2025-11-27 at 2 09 42 PM

Why this happens?

This is happening as during every index creation/updates, we verify whether metadata is sane or not by creating temporary FieldDataCache objects, and later closing them on. By doing this, the setting getting registered multiple times and causing this issue. See logic here

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented duplicate registration of fielddata and query cache settings so each is registered only once.
  • New Features

    • Exposed a read-only view for inspecting registered setting updaters.
  • Tests

    • Added an integration test verifying settings are registered exactly once across multiple index creations.
  • Changelog

    • Updated release notes to include the duplicate-registration fix.

✏️ Tip: You can customize this high-level summary in your review settings.

…egistered multiple times

Signed-off-by: Sagar Upadhyaya <[email protected]>
@sgup432 sgup432 changed the title Fix issue where FieldDataCache dynamic setting consumer was getting r… Fix issue where FieldDataCache dynamic setting consumer was getting registered multiple times Dec 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

Adds a public getter exposing setting updaters on AbstractScopedSettings, refactors IndicesService.verifyIndexMetadata to use instance cache fields instead of local cache instances, updates CHANGELOG with a fix entry, and adds an integration test that verifies certain settings are registered only once.

Changes

Cohort / File(s) Summary
Core setting updater exposure
server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java
Adds public List<SettingUpdater<?>> getSettingUpdaters() returning an unmodifiable view of the internal settingUpdaters list.
Cache init refactor
server/src/main/java/org/opensearch/indices/IndicesService.java
In verifyIndexMetadata, removes local instantiation and local closure of IndicesFieldDataCache and IndicesQueryCache; the method now uses the instance fields indicesFieldDataCache and indicesQueryCache.
Integration test
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java
Adds a protected Settings nodeSettings(int) override to disable indices query cache and new test testWithMultipleIndexCreationAndVerifySettingRegisteredOnce() that creates multiple indices and asserts setting updaters are registered exactly once using the exposed getter.
Changelog
CHANGELOG.md
Adds an Unreleased 3.x Fixed entry addressing duplicate registration of the FieldDataCache dynamic setting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect AbstractScopedSettings.getSettingUpdaters() to confirm it returns an unmodifiable view and does not expose mutable internals.
  • Review IndicesService.verifyIndexMetadata for resource lifecycle and concurrency implications after removing local cache construction/close.
  • Validate ClusterSettingsIT test for flakiness and correct use of the exposed getter rather than internal state assumptions.
  • Verify CHANGELOG.md entry formatting and placement.

Poem

🐇 I nibbled through classes, left no burr,
Opened a tiny gate to see who’s there,
I counted updaters with a careful hop,
Ensured caches stayed steady — not a single stop,
🥕 quiet hops, brighter builds, and code to share.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix duplicate registration of FieldDataCache dynamic setting' directly and specifically describes the main change across all modified files and aligns with the PR's core objective.
Description check ✅ Passed The description comprehensively explains the issue, root cause, impact, and solution with references and screenshots. It includes a filled checklist with testing confirmed. While a specific issue number is missing from the 'Resolves' section, all key information is well-documented.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a65224 and 3d396d2.

📒 Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: Analyze (java)

Comment @coderabbitai help to get the list of available commands and usage tips.

@sgup432 sgup432 changed the title Fix issue where FieldDataCache dynamic setting consumer was getting registered multiple times Fix duplicate registration of FieldDataCache dynamic setting Dec 1, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (1)

250-279: Add assertion to verify the setting was found.

The test should assert that the setting "indices.fielddata.cache.size" was actually found at least once, rather than only checking for duplicates. If the setting is never registered, the test would pass incorrectly.

             }
         }
+        assertTrue("Setting '" + settingToVerify + "' was not found in cluster settings", seen);
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 859f7c3 and bc81d8f.

📒 Files selected for processing (3)
  • server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (3 hunks)
  • server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/indices/IndicesService.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (1)
server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java (1)
  • AbstractScopedSettings (66-1087)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (2)
server/src/main/java/org/opensearch/indices/IndicesService.java (1)

1231-1237: LGTM! Prevents memory leak from repeated setting updater registrations.

The refactoring correctly addresses the issue where temporary cache objects created during metadata verification were registering dynamic setting updaters each time, causing memory bloat. By removing the cluster service dependencies, these transient objects no longer register updaters while still functioning correctly for validation purposes.

server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (1)

38-40: LGTM! Imports support the new test.

The new imports are appropriately used in the test method to inspect cluster settings and verify setting registration.

Also applies to: 49-49, 53-54, 59-59

Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

❌ Gradle check result for e3e342e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sagar Upadhyaya <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

❌ Gradle check result for 932aa69: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

❌ Gradle check result for dfc8d0d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (3)

84-90: Consider adding a comment explaining why this setting override is needed.

The purpose of disabling INDICES_QUERIES_CACHE_ALL_SEGMENTS_SETTING for this test class isn't immediately clear. If this is required to reproduce the duplicate registration scenario or to ensure consistent test behavior, a brief comment would help future maintainers understand the intent.


584-593: Fragile approach: parsing toString() output is not a stable API.

The test relies on regex-parsing the toString() representation of SettingUpdater to extract the key. This is brittle—if the toString() format changes, the test will silently fail to find any keys and the final assertion will catch it with a misleading message.

Additionally, the variable settingUpgraders is misleading since it holds SettingUpdater<?> objects, not SettingUpgrader<?>.

Consider either:

  1. Exposing a proper API on SettingUpdater to retrieve the associated setting key, or
  2. Adding a comment acknowledging this fragility and the fallback assertion at the end.
-        List<AbstractScopedSettings.SettingUpdater<?>> settingUpgraders = clusterSettings.getSettingUpdaters();
-        for (AbstractScopedSettings.SettingUpdater<?> settingUpgrader : settingUpgraders) {
-            String input = settingUpgrader.toString();
+        List<AbstractScopedSettings.SettingUpdater<?>> settingUpdaters = clusterSettings.getSettingUpdaters();
+        for (AbstractScopedSettings.SettingUpdater<?> settingUpdater : settingUpdaters) {
+            String input = settingUpdater.toString();
             // Trying to fetch key value as there is no other way
+            // Note: This relies on toString() format which may change; the final assertion guards against silent failures
             Pattern p = Pattern.compile("\"key\"\\s*:\\s*\"([^\"]+)\"");
             Matcher m = p.matcher(input);
             String key = "";

580-582: Prefer using Setting constants over hardcoded strings.

Using the actual Setting constants improves maintainability and enables compile-time verification if the setting keys change.

+        import org.opensearch.indices.fielddata.cache.IndicesFieldDataCache;
+        import org.opensearch.indices.IndicesQueryCache;
         ...
         Map<String, Boolean> settingToVerify = new HashMap<>();
-        settingToVerify.put("indices.fielddata.cache.size", false);
-        settingToVerify.put("indices.queries.cache.skip_cache_factor", false);
+        settingToVerify.put(IndicesFieldDataCache.INDICES_FIELDDATA_CACHE_SIZE_SETTING.getKey(), false);
+        settingToVerify.put(IndicesQueryCache.INDICES_QUERIES_CACHE_SKIP_CACHE_FACTOR_SETTING.getKey(), false);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfc8d0d and 1311d87.

📒 Files selected for processing (1)
  • server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (1)
server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java (1)
  • AbstractScopedSettings (66-1091)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (1)

572-605: Test logic is sound for verifying the fix.

The approach of creating multiple indices, then verifying that specific dynamic settings are registered only once in the settingUpdaters list correctly validates the fix. The final assertion at lines 600-604 is a good defensive check that ensures the expected settings were actually found.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for 089e3a9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sagar Upadhyaya <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 089e3a9 and 6a65224.

📒 Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)

Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

❕ Gradle check result for 3d396d2: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.30%. Comparing base (da18cc6) to head (3d396d2).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...search/common/settings/AbstractScopedSettings.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20140      +/-   ##
============================================
+ Coverage     73.25%   73.30%   +0.04%     
- Complexity    71684    71712      +28     
============================================
  Files          5788     5788              
  Lines        327866   327863       -3     
  Branches      47218    47218              
============================================
+ Hits         240194   240330     +136     
+ Misses        68399    68212     -187     
- Partials      19273    19321      +48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jainankitk jainankitk merged commit 2b6d266 into opensearch-project:main Dec 2, 2025
34 of 35 checks passed
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Dec 5, 2025
liuguoqingfz pushed a commit to liuguoqingfz/OpenSearch that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants