-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JMX Exporter not scraping metrics from Kafka 7.7.1 version #755
Comments
@Aravindangit003 there is no "Kafka" version 7.7.1, nor is there a Confluent Platform version 7.7.1... so I'm confused around "Confluent Kafka Version: 7.7.1" |
@Aravindangit003 Any updates/clarification? The JMX Exporter is well-tested and used on supported Confluent Platform versions. Have you resolved this issue? If there are no updates within 1 week, this will be closed as inactive. |
We are seeing the same problems, for Confluent Platform 7.3.x as well as for Apache Kafka 3.3.x. |
@db3f please provide your startup configuration, your process output |
Here is our full Java command line: java -Xmx6G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=/var/log/kafka/kafkaServer-gc.log:time,tags:filecount=10,filesize=100M -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/etc/kafka/log4j.properties -cp /usr/bin/../share/java/kafka/:/usr/bin/../share/java/confluent-telemetry/ -javaagent:/usr/share/java/cp-base-new/jmx_prometheus_javaagent-0.14.0.jar=9999:/etc/confluent/docker/kafka-rules.yaml kafka.Kafka /etc/kafka/kafka.properties Had to rename to .txt for upload. We know that the rules contain some redundancies but they work flawless and quite fast with jmx exporter 0.16.x Edit: we tried an empty config file, same result. The behaviour seems to depend "non-linear" on the number of metrics. With 10 Topics it works OK, with 20 Topics we don't see any result. 0.16.1 works flawless with 600 Topics (around 5 seconds scrape time) |
@db3f Good to know that a later version resolved the issue for you. I would recommend moving to the latest version and retest. |
Sorry of there was a misunderstanding. It’s the later version that doesn’t work. Versions up to 0.16.1 work, versions 0.17.1 and 0.17.2 don’t. We are currently using 0.14.0 in production because that’s what we used before. |
@db3f I just tested with with Confluent Platform 7.3.3 + version 0.18.0 + 500 topics and don't see any issues. You might need to enable Java trace logging to see if there is a failure. |
Just a question: Did you send messages through your Kafka Topics? Because no MBeans/metrics will be created before you do. Ah, and maybe just giving the Number of Topics is misleading. Most Metrics are per partition, we have 18000 Partitions, 3 replicas each. |
My test scenario:
|
To clarify... you are getting metrics, but there is a performance degradation with the later versions? (This issue was reported as not getting any metrics.) Using curl or wget, do you get metrics? |
We are seeing metrics up to a certain number of partitions, if we are willing to wait a few minutes. At about 12000 partitions we gave up waiting after 20 minutes. When we got metrics jmx_scrape_duration_seconds was within a few 100 milliseconds of what curl showed as download time. |
Sounds like Kafka is slow. I would suggest using the
Due to your cluster size, not sure I can create a large enough deployment to reproduce the behavior. I'll see what I can do. |
Well, if it was Kafka (or solely Kafka), versions below 0.17 should show the same behavior. But as the Grafana boards I posted a few days ago show, 0.16.1 (and for that matter all earlier versions, we are using the library since at least 0.12) do work. It seems that the scrape time was linear in the number of metrics before and is now at least quadratic, probably even exponential. |
Totally agree and it sounds like a regression, just trying to narrow the scope. I'll write an integration test with some fake MBeans to try to simulate the scenario. |
Looking at the code, there are two major changes between 0.16.0 and 0.17.1.
JmxScraper.java....
In my limited testing, single broker, 3000 partitions, I'm not hitting the scenario. I can create a |
I’m currently on a short leave but can offer to do some testing next week. |
@db3f any update on testing? One suspicion is that there is a failure getting all MBean attributes in a single call, so the attributes are getting processed one at a time. jmx_exporter/collector/src/main/java/io/prometheus/jmx/JmxScraper.java Lines 158 to 162 in c2a90ec
|
Sorry, we had some unexpected Workload recently. We think we found the solution today. We increased Kafka max heap space by 3GB and suddenly scrapes worked again. This is preliminary but we will do some more testing in this direction and report here. |
Hello @dhoard, bumping up this issue because I'm facing the same problem on some big Kafka clusters (~7400 partitions per broker). We had no issue with metrics when using JMX Exporter I noticed that while I tried to increase max heap space but I didn't notice any changes. If you're still willing to create a |
@earnil Please upgrade to the latest version and test. Version 0.20.0 added some performance improvements that decrease scrape time, but ultimately it takes time for a large cluster/many metrics. If you are capturing a lot of metrics, you will need to increase your heap size due to how the underlying |
Version 0.20.0 did increase performance paired with heap size adjustment for the biggest clusters. It still takes more than 30 seconds on some brokers, but these are very overloaded (> 15 000 partitions) so that's on us. |
Closing this issue. |
Confluent Kafka Version: 7.7.1
Java - 1.8.0_202
Tried it on Broker
JMX exporter - 0.17.2 (not scraping metrics)
JMX Exporter - 0.17.0 (not scraping metrics)
JMX Exporter - 0.16.1 (working)
JMX exporter - 0.13.0 (working)
Note: We have another Kafka Broker with Confluent kafka 5.4 version and JMX exporter 0.17.2 version is working for this.
Variable used for Kafka 7.7.1 in start.broker.sh
export KAFKA_OPTS="-javaagent:/appl/itka/jmx_exporter/jmx_prometheus_javaagent-0.17.2.jar=7071:/appl/itka/jmx_exporter/kafka-2_0_0.yml
Can you please let me know why JMX exporter 0.17.2 not working for Kafka 7.7.1 version? do I need to add any other arguments in the environment variable?
The text was updated successfully, but these errors were encountered: