Skip to content

[CI] ClusterHealthIT.testHealthOnMasterFailover failing with node closed exception #62690

@droberts195

Description

@droberts195

Build scan:

https://gradle-enterprise.elastic.co/s/xuizcgbin2xd6

Repro line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.cluster.ClusterHealthIT.testHealthOnMasterFailover" \
  -Dtests.seed=B29C60DBAC26CE66 \
  -Dtests.security.manager=true \
  -Dtests.locale=ar-SA \
  -Dtests.timezone=Canada/Saskatchewan \
  -Druntime.java=11 \
  -Dtests.fips.enabled=true

Reproduces locally?:

No

Applicable branches:

master, 7.x

Failure history:

Failed several times over the last 3 days, very rarely before that:

https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-30d,mode:quick,to:now))&_a=(columns:!(_source),index:b646ed00-7efc-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:testHealthOnMasterFailover),sort:!(process.time-start,desc))

Failure excerpt:

org.elasticsearch.cluster.ClusterHealthIT > testHealthOnMasterFailover FAILED
    java.util.concurrent.ExecutionException: org.elasticsearch.discovery.MasterNotDiscoveredException: org.elasticsearch.node.NodeClosedException: node closed {node_s2}{tvSu4oDKT-WY-CRII1wLGA}{_yoHiJeOT4W1DVsjyIEr4w}{127.0.0.1}{127.0.0.1:37819}{imr}
        at __randomizedtesting.SeedInfo.seed([B29C60DBAC26CE66:2C734FF6AFCCD280]:0)
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:273)
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:260)
        at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:87)
        at org.elasticsearch.cluster.ClusterHealthIT.testHealthOnMasterFailover(ClusterHealthIT.java:337)

        Caused by:
        org.elasticsearch.discovery.MasterNotDiscoveredException: org.elasticsearch.node.NodeClosedException: node closed {node_s2}{tvSu4oDKT-WY-CRII1wLGA}{_yoHiJeOT4W1DVsjyIEr4w}{127.0.0.1}{127.0.0.1:37819}{imr}
            at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:230)
            at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:335)
            at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252)
            at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:591)
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:834)

            Caused by:
            org.elasticsearch.node.NodeClosedException: node closed {node_s2}{tvSu4oDKT-WY-CRII1wLGA}{_yoHiJeOT4W1DVsjyIEr4w}{127.0.0.1}{127.0.0.1:37819}{imr}
                at org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction$3.onClusterServiceClose(TransportClusterHealthAction.java:208)
                at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onClusterServiceClose(ClusterStateObserver.java:328)
                at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:237)
                at org.elasticsearch.cluster.service.ClusterApplierService.doStop(ClusterApplierService.java:171)
                at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:79)
                at org.elasticsearch.cluster.service.ClusterService.doStop(ClusterService.java:108)
                at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:79)
                at org.elasticsearch.node.Node.stop(Node.java:887)
                at org.elasticsearch.node.Node.close(Node.java:912)
                at org.elasticsearch.test.InternalTestCluster$NodeAndClient.close(InternalTestCluster.java:976)
                at org.elasticsearch.test.InternalTestCluster$NodeAndClient.closeForRestart(InternalTestCluster.java:920)
                at org.elasticsearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:1692)
                at org.elasticsearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:1659)
                at org.elasticsearch.cluster.ClusterHealthIT.testHealthOnMasterFailover(ClusterHealthIT.java:327)

Metadata

Metadata

Labels

:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>test-failureTriaged test failures from CITeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions