[CI] Rolling upgrade tests failing to start after upgrading node

We have a bunch of BWC tests failing in `master`:

```
Execution failed for task ':x-pack:qa:rolling-upgrade:v7.7.0#oneThirdUpgradedTest'.
> `cluster{:x-pack:qa:rolling-upgrade:v7.7.0}` failed to wait for cluster health yellow after 40 SECONDS
  IO error while waiting cluster
    503 Service Unavailable
  > IO error while waiting cluster
    > 503 Service Unavailable
```

The problem here is the cluster failing to come up after upgrading one of the cluster nodes from `7.7.0` (i.e. latest from `7.x` branch) to `8.0.0` (i.e. `master`).

The logs are littered with logs of SSL/crypto type errors, as well as this one:

```
»  Caused by: java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.cluster.ClusterState$Custom][]
»  	at org.elasticsearch.common.io.stream.NamedWriteableRegistry.getReader(NamedWriteableRegistry.java:113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:45) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.ClusterState.readFrom(ClusterState.java:728) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.coordination.ValidateJoinRequest.<init>(ValidateJoinRequest.java:33) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:56) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:175) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:118) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:667) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»  	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
```

It's not clear to me (since these are all `info` and `warn` level logs) which is stopping the cluster from actually being formed. My guess is the "failed to join" errors are the problem, given the whole point of these tests is to ensure that an 8.0 node can talk to a 7.7 cluster.

https://gradle-enterprise.elastic.co/s/6asy246orjjj6/console-log?task=:x-pack:qa:rolling-upgrade:v7.7.0%23oneThirdUpgradedTest

There have been over 20 of these failures today across all CI builds (pull requests, feature branches, etc). It didn't reproduce locally more me however, and I'm quite surprised we haven't seen an intake build fail with this yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Rolling upgrade tests failing to start after upgrading node #53042

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CI] Rolling upgrade tests failing to start after upgrading node #53042

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions