-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
We have a bunch of BWC tests failing in master:
Execution failed for task ':x-pack:qa:rolling-upgrade:v7.7.0#oneThirdUpgradedTest'.
> `cluster{:x-pack:qa:rolling-upgrade:v7.7.0}` failed to wait for cluster health yellow after 40 SECONDS
IO error while waiting cluster
503 Service Unavailable
> IO error while waiting cluster
> 503 Service Unavailable
The problem here is the cluster failing to come up after upgrading one of the cluster nodes from 7.7.0 (i.e. latest from 7.x branch) to 8.0.0 (i.e. master).
The logs are littered with logs of SSL/crypto type errors, as well as this one:
» Caused by: java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.cluster.ClusterState$Custom][]
» at org.elasticsearch.common.io.stream.NamedWriteableRegistry.getReader(NamedWriteableRegistry.java:113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:45) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.cluster.ClusterState.readFrom(ClusterState.java:728) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.cluster.coordination.ValidateJoinRequest.<init>(ValidateJoinRequest.java:33) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:56) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:175) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:118) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:667) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
It's not clear to me (since these are all info and warn level logs) which is stopping the cluster from actually being formed. My guess is the "failed to join" errors are the problem, given the whole point of these tests is to ensure that an 8.0 node can talk to a 7.7 cluster.
There have been over 20 of these failures today across all CI builds (pull requests, feature branches, etc). It didn't reproduce locally more me however, and I'm quite surprised we haven't seen an intake build fail with this yet.