Skip to content

Fixed assertion unsafe call of ClusterService.state()#19775

Merged
cwperks merged 2 commits intoopensearch-project:mainfrom
nibix:cluster-applier-service-assertion
Oct 27, 2025
Merged

Fixed assertion unsafe call of ClusterService.state()#19775
cwperks merged 2 commits intoopensearch-project:mainfrom
nibix:cluster-applier-service-assertion

Conversation

@nibix
Copy link
Contributor

@nibix nibix commented Oct 27, 2025

Description

I have been recently reviewing test logs of the security plugin and found large amounts of test failures caused by this assertion error:

2025-10-26T20:04:25.1579948Z     java.lang.AssertionError: initial cluster state not set yet
2025-10-26T20:04:25.1580807Z         at org.opensearch.cluster.service.ClusterApplierService.state(ClusterApplierService.java:239)
2025-10-26T20:04:25.1581827Z         at org.opensearch.cluster.service.ClusterService.state(ClusterService.java:187)
2025-10-26T20:04:25.1583236Z         at org.opensearch.node.ResourceUsageCollectorService.collectLocalNodeResourceUsageStats(ResourceUsageCollectorService.java:129)
2025-10-26T20:04:25.1584928Z         at org.opensearch.node.ResourceUsageCollectorService.lambda$doStart$3(ResourceUsageCollectorService.java:147)
2025-10-26T20:04:25.1586094Z         at org.opensearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:246)
2025-10-26T20:04:25.1587410Z         at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:975)
2025-10-26T20:04:25.1588919Z         at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
2025-10-26T20:04:25.1589916Z         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
2025-10-26T20:04:25.1590927Z         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
2025-10-26T20:04:25.1591709Z         at java.base/java.lang.Thread.run(Thread.java:1447)

(source)

This is caused by this code:

private void collectLocalNodeResourceUsageStats() {
if (nodeResourceUsageTracker.isReady() && clusterService.state() != null) {
collectNodeResourceUsageStats(
clusterService.state().nodes().getLocalNodeId(),
System.currentTimeMillis(),
nodeResourceUsageTracker.getMemoryUtilizationPercent(),
nodeResourceUsageTracker.getCpuUtilizationPercent(),
nodeResourceUsageTracker.getIoUsageStats()
);
}
}

If you examine ClusterApplierService.state(), you will see that it will throw an assertion error if clusterState is still null, something that will happen during early cluster initialization phases. Thus, the test clusterService.state() is bogus, as it won't guard the code in environments with assertions enabled:

public ClusterState state() {
assert assertNotCalledFromClusterStateApplier("the applied cluster state is not yet available");
ClusterState clusterState = this.state.get();
assert clusterState != null : "initial cluster state not set yet";
return clusterState;
}

One should use clusterService.isStateInitialised() instead to perform the check assertion-safe.

Check List

  • Functionality includes testing: manually tested

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@nibix nibix requested a review from a team as a code owner October 27, 2025 01:36
@github-actions
Copy link
Contributor

❌ Gradle check result for 0fc2db2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

nibix added 2 commits October 27, 2025 04:10
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 004694d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 004694d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 004694d: SUCCESS

@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.13%. Comparing base (753c135) to head (004694d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...opensearch/node/ResourceUsageCollectorService.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19775      +/-   ##
============================================
+ Coverage     73.10%   73.13%   +0.02%     
+ Complexity    70959    70958       -1     
============================================
  Files          5737     5737              
  Lines        324766   324766              
  Branches      46981    46981              
============================================
+ Hits         237425   237507      +82     
+ Misses        68226    68110     -116     
- Partials      19115    19149      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cwperks cwperks merged commit 99393b5 into opensearch-project:main Oct 27, 2025
35 of 40 checks passed
anandpatel9998 pushed a commit to anandpatel9998/OpenSearch that referenced this pull request Nov 3, 2025
…ject#19775)

* Fixed assertion unsafe call of ClusterService.state()

Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>

* Added changelog

Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>

---------

Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
liuguoqingfz pushed a commit to liuguoqingfz/OpenSearch that referenced this pull request Dec 15, 2025
…ject#19775)

* Fixed assertion unsafe call of ClusterService.state()

Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>

* Added changelog

Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>

---------

Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants