Replies: 2 comments 8 replies
-
You should probably start by sharing some logs and configurations and explaining why do you think this is an issue (or what exactly is the issue). Because I'm afraid it is not really clear. |
Beta Was this translation helpful? Give feedback.
-
To summary, I think there is issue that kafka broker process started before readinessProbe is completed. benchmark-kafka-0 0/1 Running 0 5s
benchmark-kafka-1 1/1 Running 0 29h
benchmark-kafka-2 1/1 Running 0 6d5h From the benchmark-kafka-0 pod's logs, in my understanding, Kafka broker registers to Zookeeper and starts running in about 4 seconds. This means the broker is advertised and clients can connect to it. 2024-04-18 15:04:56,457 INFO Starting KafkaAgent with brokerReadyFile=/var/opt/kafka/kafka-ready and sessionConnectedFile=/var/opt/kafka/zk-connected (io.strimzi.kafka.agent.KafkaAgent) [main]
...
2024-04-18 15:05:00,119 INFO Registered broker 0 at path /brokers/ids/0 with addresses: CONTROLPLANE-9090://benchmark-kafka-0.benchmark-kafka-brokers.benchmark.svc:9090,REPLICATION-9091://benchmark-kafka-0.benchmark-kafka-brokers.benchmark.svc:9091,INTERNAL-9092://<IP>:30001,EXTERNAL-9095://<domain>:30101, <...> (broker epoch): <...> (kafka.zk.KafkaZkClient) [main]
2024-04-18 15:05:00,177 INFO [ControllerEventThread controllerId=0] Starting (kafka.controller.ControllerEventManager$ControllerEventThread) [controller-event-thread]
...
2024-04-18 15:05:00,558 INFO [KafkaServer id=0] started (kafka.server.KafkaServer) [main] But the readinessProbe is not completed yet (initialDelaySeconds is 15s), the pod status is still in (0/1). benchmark-kafka-0 0/1 Running 0 10s If there are any client try to connect to this broker (because the broker is advertised), the connection will timeout since kube-proxy is not allow routing to it yet. Please see if my understanding is missing and there are any configurations to solve this. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Bug Description
Context:
When I do rolling update for Kafka broker pod, one by one, by deleting each pod at a time. The broker container will run and register itself to Zookeeper before pod readinessProbe complete !
As I'm using rack awareness and the broker (I'm rolling the client's prefered broker) is registered to Zookeeper (registration done in less than 4s), but because pod readinessProbe is not completed yet (default 15s initialDelaySeconds), the client will try to change the connection back to the prefered broker and fail because the pod is not ready yet.
I assume that if the issue could be solved by having option to remove the readinessProbe.
Steps to reproduce
Expected behavior
No timeout connection when client change connection back to its prefered broker
Strimzi version
0.32.0
Kubernetes version
1.27
Installation method
Helm
Infrastructure
Amazon EKS
Configuration files and logs
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions