You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear community,
we just did an upgrade of strimzi from 0.37 to 0.39. For three out of our four stages, it went fine.
For our last stage, we're facing some issues after the upgrade:
Strimzi restarted kafka brokers 1, 2, 3 and 4.
0, 5, 6 haven't been restarted.
In the strimzi-cluster-operator, For some of the topics, we are now seeing the following INFO log outputs, showing why strimzi-cluster-operator is not able to restart the remaining brokers:
{topic_name_1}/0 will be under-replicated (ISR={0,6}, replicas=[0,1,6], min.insync.replicas=2) if broker 6 is restarted.
{topic_name_2}/11 will be under-replicated (ISR={0,5}, replicas=[0,2,5], min.insync.replicas=2) if broker 5 is restarted.
{topic_name_3}/1 will be under-replicated (ISR={0,5}, replicas=[0,5,2], min.insync.replicas=2) if broker 5 is restarted.
... and so on
All of the out of sync replicas are the ones that haven't been restarted.
None of the nodes were restarted, so it is surprising that some Kafka Brokers are not able to connect to the partition leaders for some topics.
In the restarted brokers we can see the following WARNING, where we can see that the fetch request (from replicas to leader) is failing / timing out:
In the Zookeeper logs we can see the following (all three were restarted at the time of update):
Detected Zookeeper ID 1
Preparing truststore
Adding /opt/kafka/cluster-ca-certs/ca.crt to truststore /tmp/zookeeper/cluster.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore is complete
Looking for the right CA
Found the right CA: /opt/kafka/cluster-ca-certs/ca.crt
Preparing keystore for client and quorum listeners
Preparing keystore for client and quorum listeners is complete
Starting Zookeeper with configuration:
# The directory where the snapshot is stored.
dataDir=/var/lib/zookeeper/data
# Other options
4lw.commands.whitelist=*
standaloneEnabled=false
reconfigEnabled=true
# TLS options
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.clientAuth=need
ssl.quorum.clientAuth=need
secureClientPort=2181
sslQuorum=true
ssl.trustStore.location=/tmp/zookeeper/cluster.truststore.p12
ssl.trustStore.password=[hidden]
ssl.trustStore.type=PKCS12
ssl.quorum.trustStore.location=/tmp/zookeeper/cluster.truststore.p12
ssl.quorum.trustStore.password=[hidden]
ssl.quorum.trustStore.type=PKCS12
ssl.keyStore.location=/tmp/zookeeper/cluster.keystore.p12
ssl.keyStore.password=[hidden]
ssl.keyStore.type=PKCS12
ssl.quorum.keyStore.location=/tmp/zookeeper/cluster.keystore.p12
ssl.quorum.keyStore.password=[hidden]
ssl.quorum.keyStore.type=PKCS12
# Provided configuration
tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
admin.enableServer=false
# Zookeeper nodes configuration
server.1=0.0.0.0:2888:3888:participant;127.0.0.1:12181
server.2=my-project-zookeeper-1.my-project-zookeeper-nodes.kafka.svc:2888:3888:participant;127.0.0.1:12181
server.3=my-project-zookeeper-2.my-project-zookeeper-nodes.kafka.svc:2888:3888:participant;127.0.0.1:12181
+ exec /usr/bin/tini -w -e 143 -- /opt/kafka/bin/zookeeper-server-start.sh /tmp/zookeeper.properties
2024-07-01 09:02:06,487 WARN maxCnxns is not configured, using default value 0. (org.apache.zookeeper.server.ServerCnxnFactory) [main]
2024-07-01 09:02:06,488 WARN maxCnxns is not configured, using default value 0. (org.apache.zookeeper.server.ServerCnxnFactory) [main]
2024-07-01 09:02:07,278 WARN Interrupting SendWorker thread from RecvWorker. sid: 2. myId: 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:2]
2024-07-01 09:02:07,279 WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:2]
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1447)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1271)
2024-07-01 09:02:07,280 WARN Send worker leaving thread id 2 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:2]
2024-07-01 09:02:07,283 WARN Interrupting SendWorker thread from RecvWorker. sid: 3. myId: 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:3]
2024-07-01 09:02:07,283 WARN Send worker leaving thread id 3 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:3]
2024-07-01 09:02:07,461 WARN Clobbering already-set QuorumCnxManager (restarting leader election?) (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2024-07-01 09:02:29,692 WARN Connection broken for id 3, my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:3]
java.net.SocketException: Socket is closed
at java.base/java.net.Socket.shutdownInput(Socket.java:1601)
at java.base/sun.security.ssl.BaseSSLSocketImpl.shutdownInput(BaseSSLSocketImpl.java:219)
at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:853)
at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:825)
at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1733)
at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1467)
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1069)
at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:693)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1384)
2024-07-01 09:02:29,693 WARN Interrupting SendWorker thread from RecvWorker. sid: 3. myId: 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:3]
2024-07-01 09:02:29,693 WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:3]
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1447)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1271)
2024-07-01 09:02:29,693 WARN Send worker leaving thread id 3 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:3]
2024-07-01 09:02:40,782 WARN Connection broken for id 3, my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:3]
java.io.EOFException
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:386)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1384)
2024-07-01 09:02:40,782 WARN Exception when using channel: for id 3 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:3]
java.net.SocketException: Connection or outbound has been closed
at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:291)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1308)
at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedOutputStream.write(UnifiedServerSocket.java:766)
at java.base/java.io.DataOutputStream.writeInt(DataOutputStream.java:208)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:1228)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1279)
2024-07-01 09:02:40,782 WARN Interrupting SendWorker thread from RecvWorker. sid: 3. myId: 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:3]
2024-07-01 09:02:40,782 WARN Send worker leaving thread id 3 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:3]
2024-07-01 09:03:03,316 WARN Connection broken for id 2, my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:2]
java.net.SocketException: Socket is closed
at java.base/java.net.Socket.shutdownInput(Socket.java:1601)
at java.base/sun.security.ssl.BaseSSLSocketImpl.shutdownInput(BaseSSLSocketImpl.java:219)
at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:853)
at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:825)
at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1733)
at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1467)
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1069)
at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:693)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1384)
2024-07-01 09:03:03,316 WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
java.io.EOFException
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:386)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:219)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:125)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1467)
2024-07-01 09:03:03,316 WARN Interrupting SendWorker thread from RecvWorker. sid: 2. myId: 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:2]
2024-07-01 09:03:03,316 WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:2]
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1447)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1271)
2024-07-01 09:03:03,317 WARN Send worker leaving thread id 2 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:2]
2024-07-01 09:03:03,336 WARN PeerState set to LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2024-07-01 09:03:03,338 WARN Cannot open channel to 2 at election address my-project-zookeeper-1.my-project-zookeeper-nodes.kafka.svc/10.130.9.184:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=1]-3]
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:554)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:633)
at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
2024-07-01 09:03:03,354 WARN Cannot open channel to 2 at election address my-project-zookeeper-1.my-project-zookeeper-nodes.kafka.svc/10.130.9.184:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=1]-3]
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:554)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:633)
at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
2024-07-01 09:03:03,556 WARN Unexpected exception, tries=0, remaining init limit=9999, connecting to my-project-zookeeper-2.my-project-zookeeper-nodes.kafka.svc/10.130.18.209:2888 (org.apache.zookeeper.server.quorum.Learner) [LeaderConnector-my-project-zookeeper-2.my-project-zookeeper-nodes.kafka.svc/10.130.18.209:2888]
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:554)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:633)
at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:292)
at org.apache.zookeeper.server.quorum.Learner$LeaderConnector.connectToLeader(Learner.java:408)
at org.apache.zookeeper.server.quorum.Learner$LeaderConnector.run(Learner.java:366)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
2024-07-01 09:03:32,904 WARN Got zxid 0x3000000001 expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2024-07-01 09:03:35,900 WARN Connection broken for id 2, my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:2]
java.io.EOFException
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:386)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1384)
2024-07-01 09:03:35,901 WARN Interrupting SendWorker thread from RecvWorker. sid: 2. myId: 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [RecvWorker:2]
2024-07-01 09:03:35,901 WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:2]
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1447)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1271)
2024-07-01 09:03:35,901 WARN Send worker leaving thread id 2 my id = 1 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [SendWorker:2]
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear community,
we just did an upgrade of strimzi from 0.37 to 0.39. For three out of our four stages, it went fine.
For our last stage, we're facing some issues after the upgrade:
Strimzi restarted kafka brokers 1, 2, 3 and 4.
0, 5, 6 haven't been restarted.
In the strimzi-cluster-operator, For some of the topics, we are now seeing the following INFO log outputs, showing why strimzi-cluster-operator is not able to restart the remaining brokers:
... and so on
All of the out of sync replicas are the ones that haven't been restarted.
None of the nodes were restarted, so it is surprising that some Kafka Brokers are not able to connect to the partition leaders for some topics.
In the restarted brokers we can see the following WARNING, where we can see that the fetch request (from replicas to leader) is failing / timing out:
In the Zookeeper logs we can see the following (all three were restarted at the time of update):
Beta Was this translation helpful? Give feedback.
All reactions