Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker pod fails to start intermittently with KafkaProducer bootstrap-servers error #816

Closed
nmichaud2 opened this issue May 20, 2022 · 6 comments

Comments

@nmichaud2
Copy link

nmichaud2 commented May 20, 2022

Describe the bug
Broker pod fails to start when deploying KafkaCluster

[2022-05-19 15:22:13,767] WARN Couldn't resolve server kafka-myname-headless.mynamespace.svc.cluster.local:29092 from bootstrap.servers as DNS resolution failed for kafka-myname-headless.mynamespace.svc.cluster.local (org.apache.kafka.clients.ClientUtils) [2022-05-19 15:22:13,768] INFO [Producer clientId=CruiseControlMetricsReporter] Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2022-05-19 15:22:13,768] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics) [2022-05-19 15:22:13,768] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics) [2022-05-19 15:22:13,768] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) [2022-05-19 15:22:13,866] INFO App info kafka.producer for CruiseControlMetricsReporter unregistered (org.apache.kafka.common.utils.AppInfoParser) [2022-05-19 15:22:13,870] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.apache.kafka.common.KafkaException: Failed to construct kafka producer at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:440) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:291) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:318) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:303) at com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter.configure(CruiseControlMetricsReporter.java:170) at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:401) at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:474) at kafka.server.DynamicMetricsReporters.createReporters(DynamicBrokerConfig.scala:799) at kafka.server.DynamicMetricsReporters.<init>(DynamicBrokerConfig.scala:748) at kafka.server.DynamicBrokerConfig.addReconfigurables(DynamicBrokerConfig.scala:248) at kafka.server.KafkaServer.startup(KafkaServer.scala:389) at kafka.Kafka$.main(Kafka.scala:109) at kafka.Kafka.main(Kafka.scala) Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:89) at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:48) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:414) ... 12 more [2022-05-19 15:22:13,963] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
Steps to reproduce the issue:
N/A intermittent

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem like release numberm version, branch, etc.

@stoader
Copy link
Member

stoader commented May 20, 2022

The DNS resolution failed for kafka-myname-headless.mynamespace.svc.cluster.local error messages indicates an issue with DNS on your Kubernetes cluster. Can you check if kafka-myname-headless service exists in the mynamesapce namespace and whether kafka-myname-headless.mynamespace.svc.cluster.local resolves fine to pod ip from witihin the Kubernetes cluster?

Also what Cruise Control version are you using?

@nmichaud2
Copy link
Author

The pod is indeed not up because it's during its startup that the error appears. From what I understand, it creates a KafkaProducer to report metrics to cruise control and it set the bootstrap-servers to itself. It used to work but started failing intermittently

@stoader
Copy link
Member

stoader commented May 24, 2022

The cruise control reporter connect to Kafka via kafka-myname-headless.mynamespace.svc.cluster.local but apparently this service address not resolvable which could be due to what I described above.

Can you check what I described above and reply with results?

@nmichaud2
Copy link
Author

The service is there,. Nslookup doesn't work. We're at 0.20.2 but upgrading to 0.21.2 to see if it helps.

We do have two clusters in the same namespace as well(one of the cluster is always fine). #792

@stoader
Copy link
Member

stoader commented May 24, 2022

#792 is a different issue which occurs only in case of downscale so it is unrelated to the issue you see. My best guess is that the DNS service of you K8s cluster is just slow under certain conditions.

Does the CruiseControl version you are using include linkedin/cruise-control#1772 which addresses linkedin/cruise-control#1760 ?

@nmichaud2
Copy link
Author

2.5.79. This does seems like the issue we're having. Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants