Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Kafka upgrade issue 2.8.0 -> 3.0.0 (zookeeper failures) #9527

Closed
edgarkz opened this issue Jan 9, 2024 · 2 comments
Closed

[Bug]: Kafka upgrade issue 2.8.0 -> 3.0.0 (zookeeper failures) #9527

edgarkz opened this issue Jan 9, 2024 · 2 comments

Comments

@edgarkz
Copy link

edgarkz commented Jan 9, 2024

Bug Description

Hello,
We are running into zookeeper issues while upgrading kafka using the operator from 2.8.0 -> 3.0.0
Our setup includes 3 zookeepers and 3 kafka brokers with below config (trimmed)

While upgrading the kafka version to 3.0.0:

  1. stimzi operator detects zookeper vesion upgrade is required and starts rolling restarts,
    1 zookeeper instance was upgraded fine,
    2 zookeeper instance starts and eventually throws the below log and keeps restarting:
2024-01-07 07:48:40,934 INFO Using org.apache.zookeeper.server.watch.WatchManager as watch manager (org.apache.zookeeper.server.watch.WatchManagerFactory) [main]
2024-01-07 07:48:40,934 INFO Using org.apache.zookeeper.server.watch.WatchManager as watch manager (org.apache.zookeeper.server.watch.WatchManagerFactory) [main]
2024-01-07 07:48:40,936 INFO zookeeper.snapshotSizeFactor = 0.33 (org.apache.zookeeper.server.ZKDatabase) [main]
2024-01-07 07:48:40,936 INFO zookeeper.commitLogCount=500 (org.apache.zookeeper.server.ZKDatabase) [main]
2024-01-07 07:48:41,233 INFO Using TLS encrypted quorum communication (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,233 INFO Port unification disabled (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,233 INFO multiAddress.enabled set to false (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,234 INFO multiAddress.reachabilityCheckEnabled set to true (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,234 INFO multiAddress.reachabilityCheckTimeoutMs set to 1000 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,234 INFO QuorumPeer communication is not secured! (SASL auth disabled) (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,234 INFO quorum.cnxn.threads.size set to 20 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2024-01-07 07:48:41,235 INFO Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.600026f9b (org.apache.zookeeper.server.persistence.FileSnap) [main]
2024-01-07 07:48:42,535 WARN Got EOF exception while reading the digest, likely due to the reading an older snapshot. (org.apache.zookeeper.server.DataTree) [main]

Eventually after some restarts all 3 zookeeper instances enter unhealthy state and keep restarting

Can you please advise how to proceed from here?

Zookeeper data directory1:
image
2#
image
3#
image

Strimzi operator version: 0.27.1
K8s EKS 1.24

Steps to reproduce

Install 2.8.0 kafka using crd
Upgrade crd to kafka 3.0.0

Expected behavior

No response

Strimzi version

0.27.1

Kubernetes version

EKS 1.24

Installation method

Helm

Infrastructure

Amazon EKS

Configuration files and logs

  kafka:
    version: 2.8.0
    replicas: 3
  zookeeper:
    replicas: 3
    resources:
      requests:
        memory: 1Gi
        cpu: 200m
      limits:
        memory: 1Gi
        cpu: 200m
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
 config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      min.insync.replicas: 2
      log.message.format.version: "2.8"
      inter.broker.protocol.version: "2.8"
      auto.create.topics.enable: "false"
      message.max.bytes: 5242880

Additional context

No response

@scholzj
Copy link
Member

scholzj commented Jan 9, 2024

I guess this is duplicate of #9517?

@edgarkz
Copy link
Author

edgarkz commented Jan 9, 2024

Hi @scholzj
I closed the previous one as it was raised as general qa instead of bug.

@edgarkz edgarkz closed this as completed Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants