-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix duplicate registration of FieldDataCache dynamic setting #20140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…egistered multiple times Signed-off-by: Sagar Upadhyaya <[email protected]>
WalkthroughAdds a public getter exposing setting updaters on AbstractScopedSettings, refactors IndicesService.verifyIndexMetadata to use instance cache fields instead of local cache instances, updates CHANGELOG with a fix entry, and adds an integration test that verifies certain settings are registered only once. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (1)
250-279: Add assertion to verify the setting was found.The test should assert that the setting
"indices.fielddata.cache.size"was actually found at least once, rather than only checking for duplicates. If the setting is never registered, the test would pass incorrectly.} } + assertTrue("Setting '" + settingToVerify + "' was not found in cluster settings", seen); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java(3 hunks)server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java(1 hunks)server/src/main/java/org/opensearch/indices/IndicesService.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (1)
server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java (1)
AbstractScopedSettings(66-1087)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
🔇 Additional comments (2)
server/src/main/java/org/opensearch/indices/IndicesService.java (1)
1231-1237: LGTM! Prevents memory leak from repeated setting updater registrations.The refactoring correctly addresses the issue where temporary cache objects created during metadata verification were registering dynamic setting updaters each time, causing memory bloat. By removing the cluster service dependencies, these transient objects no longer register updaters while still functioning correctly for validation purposes.
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java (1)
38-40: LGTM! Imports support the new test.The new imports are appropriately used in the test method to inspect cluster settings and verify setting registration.
Also applies to: 49-49, 53-54, 59-59
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/index/fielddata/FieldDataLoadingIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java
Show resolved
Hide resolved
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
server/src/main/java/org/opensearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for e3e342e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
|
❌ Gradle check result for 932aa69: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar <[email protected]>
|
❌ Gradle check result for dfc8d0d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Upadhyaya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (3)
84-90: Consider adding a comment explaining why this setting override is needed.The purpose of disabling
INDICES_QUERIES_CACHE_ALL_SEGMENTS_SETTINGfor this test class isn't immediately clear. If this is required to reproduce the duplicate registration scenario or to ensure consistent test behavior, a brief comment would help future maintainers understand the intent.
584-593: Fragile approach: parsingtoString()output is not a stable API.The test relies on regex-parsing the
toString()representation ofSettingUpdaterto extract the key. This is brittle—if thetoString()format changes, the test will silently fail to find any keys and the final assertion will catch it with a misleading message.Additionally, the variable
settingUpgradersis misleading since it holdsSettingUpdater<?>objects, notSettingUpgrader<?>.Consider either:
- Exposing a proper API on
SettingUpdaterto retrieve the associated setting key, or- Adding a comment acknowledging this fragility and the fallback assertion at the end.
- List<AbstractScopedSettings.SettingUpdater<?>> settingUpgraders = clusterSettings.getSettingUpdaters(); - for (AbstractScopedSettings.SettingUpdater<?> settingUpgrader : settingUpgraders) { - String input = settingUpgrader.toString(); + List<AbstractScopedSettings.SettingUpdater<?>> settingUpdaters = clusterSettings.getSettingUpdaters(); + for (AbstractScopedSettings.SettingUpdater<?> settingUpdater : settingUpdaters) { + String input = settingUpdater.toString(); // Trying to fetch key value as there is no other way + // Note: This relies on toString() format which may change; the final assertion guards against silent failures Pattern p = Pattern.compile("\"key\"\\s*:\\s*\"([^\"]+)\""); Matcher m = p.matcher(input); String key = "";
580-582: Prefer using Setting constants over hardcoded strings.Using the actual
Settingconstants improves maintainability and enables compile-time verification if the setting keys change.+ import org.opensearch.indices.fielddata.cache.IndicesFieldDataCache; + import org.opensearch.indices.IndicesQueryCache; ... Map<String, Boolean> settingToVerify = new HashMap<>(); - settingToVerify.put("indices.fielddata.cache.size", false); - settingToVerify.put("indices.queries.cache.skip_cache_factor", false); + settingToVerify.put(IndicesFieldDataCache.INDICES_FIELDDATA_CACHE_SIZE_SETTING.getKey(), false); + settingToVerify.put(IndicesQueryCache.INDICES_QUERIES_CACHE_SKIP_CACHE_FACTOR_SETTING.getKey(), false);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (1)
server/src/main/java/org/opensearch/common/settings/AbstractScopedSettings.java (1)
AbstractScopedSettings(66-1091)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/settings/ClusterSettingsIT.java (1)
572-605: Test logic is sound for verifying the fix.The approach of creating multiple indices, then verifying that specific dynamic settings are registered only once in the
settingUpdaterslist correctly validates the fix. The final assertion at lines 600-604 is a good defensive check that ensures the expected settings were actually found.
Signed-off-by: Sagar <[email protected]>
|
❌ Gradle check result for 089e3a9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Upadhyaya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CHANGELOG.md(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: Analyze (java)
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
|
❕ Gradle check result for 3d396d2: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20140 +/- ##
============================================
+ Coverage 73.25% 73.30% +0.04%
- Complexity 71684 71712 +28
============================================
Files 5788 5788
Lines 327866 327863 -3
Branches 47218 47218
============================================
+ Hits 240194 240330 +136
+ Misses 68399 68212 -187
- Partials 19273 19321 +48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…rch-project#20140) Signed-off-by: Sagar Upadhyaya <[email protected]> Signed-off-by: Sagar <[email protected]>
…rch-project#20140) Signed-off-by: Sagar Upadhyaya <[email protected]> Signed-off-by: Sagar <[email protected]>
Description
We encountered an issue where fieldDataCache dynamic setting was getting registered multiple times (link) causing settingUpdaters list to bloat up (link) and eventually causing memory issues on active cluster manager node.
This is a sample screenshot which shows multiple such settingUpdaters, all belonging to FieldDataCache and specifically
indices.fielddata.cache.sizesetting.Why this happens?
This is happening as during every index creation/updates, we verify whether metadata is sane or not by creating temporary FieldDataCache objects, and later closing them on. By doing this, the setting getting registered multiple times and causing this issue. See logic here
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Bug Fixes
New Features
Tests
Changelog
✏️ Tip: You can customize this high-level summary in your review settings.