-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ #26332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| <exclusions> | ||
| <exclusion> | ||
| <groupId>com.rabbitmq</groupId> | ||
| <artifactId>amqp-client</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would otherwise be a new dependency on RabbitMQ, but, my hunch is that it is not necessary for Spark.
|
Wow. Nice! |
|
BTW, what happens to |
|
Test build #112974 has finished for PR 26332 at commit
|
|
I don't know of a reason it would change Ganglia support. |
|
Oh, wait no I see the issue. Ganglia support is in a different module that I didn't enable. Indeed it isn't present in 4.x: https://search.maven.org/search?q=g:io.dropwizard.metrics%20AND%20a:metrics-ganglia&core=gav Hm, OK, that's a problem, as we have historically maintained some Ganglia support. Maybe I need to take this to dev@ ... if the support doesn't necessarily work on JDK 11 (unproven) then do we need to consider removing it? |
|
+1 for removing in JDK11. Since we dropped |
|
Oh.. It's difficult to keep separately for JDK8/11. |
|
Adding a link to a similar discussion in #26212 |
|
Yep, I am next going to try that, just inlining the simple single file metrics-ganglia contained. Dropwizard itself is ALv2. Ganglia is not, but, Spark doesn't publish this module directly, so that doesn't change. |
|
Yep, I am next going to try that, just inlining the simple single file metrics-ganglia contained. Dropwizard itself is ALv2. Ganglia is not, but, Spark doesn't publish this module directly, so that doesn't change. |
| <dependency> | ||
| <groupId>io.dropwizard.metrics</groupId> | ||
| <artifactId>metrics-ganglia</artifactId> | ||
| <groupId>info.ganglia.gmetric4j</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was previously a transitive dependency anyway, and should have been directly referenced.
| The Apache Software Foundation (http://www.apache.org/). | ||
|
|
||
|
|
||
| Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should have already been there, but now would also need to be in NOTICE
|
Test build #113036 has finished for PR 26332 at commit
|
|
Test build #113037 has finished for PR 26332 at commit
|
|
IIUC, we are publishing our
|
|
Ah sorry, you are correct. What I really mean is that it isn't bundled in the binary release, because that would entail distributing the ganglia dependency. At least this was true last time I checked! |
|
Retest this please. |
|
Test build #113178 has finished for PR 26332 at commit
|
|
Retest this please. |
|
Test build #113182 has finished for PR 26332 at commit
|
|
Thank you so much, @srowen . Merged to master. |
|
@LucaCanali @jerryshao maybe I can get your advice on something related here. I was investigating back-porting this to 2.4.x. I find that the tests run out of memory with this change. A heap dump suggests that a lot of memory in consumed by As part of my hacking, I found that manually clearing all the Is this crazy? like, is clearing out all of the metrics going to break something? or is this a sensible cleanup step? I'm still investigating, but wanted to reach out to anyone who may be familiar with this code. |
|
What you write sounds reasonable to me. As an additional comment I'd say that while stopping the metrics system makes sense for a testing suite, it is not something that happens often in real usage: for most cases I can think of, metrics are started/registered at the component instance startup and they stay up for all its life. |
|
Thanks @LucaCanali . Yeah, that's my worry. This means that after the Spark cluster is stopped, the metrics all disappear. Is that expected, or a problem? I am not sure how metrics are used. Of course at that point the components' lifecycle has ended anyway. Would external services rely on 'pulling' metrics after it's done but before the JVM stops, or do metrics implementations typically 'push' as they go? sorry for the dumb question. |
|
Metrics are written to sinks at regular time intervals (configurable, let’s say 10 seconds). I mostly use the Graphite sink to write to an InfluxDB instance. Metrics are persisted there and consumed from InfluxDB by a Grafana dashboard. |
|
OK it's sounding reasonable to explicitly remove the registry then, as it's going away pretty immediately anyway, and the sinks are shutdown. I will propose it. That helps the tests, which do start/stop contexts, and I think we otherwise somehow 'leak' all the previous metrics registries in this case. However it's kind of specific to the tests. The other question remains: why is this an issue only in 2.4 (as far as I can tell), and only with metrics 4.x? Since you've looked at a lot of metrics related changes @LucaCanali , do you recall anything we fixed in 3.0 that might help avoid using a lot of memory in the metrics system? I don't, and didn't see anything in skimming JIRAs. I am still not clear whether the big closures captured by gauges that Spark registers are necessary or an accident that will otherwise cause long-running apps to 'leak' somehow. I don't think so; what it retains a reference to are all the current Spark machinery like BlockManager. Not a problem as long as there is only one live BlockManager anyway. Still, need to keep an eye on it in 3.0. |
|
Most of the PRs for the metrics system targeting 3.0 so far have been about adding new metrics. |
### What changes were proposed in this pull request? Bump Dropwizard version from 3.2.6 to 4.2.25. Meanwhile, introduce `metrics_jvm_thread_peak_count_Value` and `metrics_jvm_thread_total_started_count_Value` in `celeborn-jvm-dashboard.json`. ### Why are the changes needed? Dropwizard metrics has released v4.2.25 including some bugfixes and improvements including: * [JVM] Fix maximum/total memory calculation: dropwizard/metrics#3125 * [Thread] Add peak and total started thread count to `ThreadStatesGaugeSet`: dropwizard/metrics#1601 Meanwhile, Ratis version has upgraded to 3.0.1 which has no compatibility problem with Dropwizard 4.2.25. Backport: - apache/spark#26332 - apache/spark#29426 - apache/spark#37372 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes #2540 from SteNicholas/CELEBORN-1389. Authored-by: SteNicholas <[email protected]> Signed-off-by: mingji <[email protected]>
### What changes were proposed in this pull request? Bump Dropwizard version from 3.2.6 to 4.2.25. Meanwhile, introduce `metrics_jvm_thread_peak_count_Value` and `metrics_jvm_thread_total_started_count_Value` in `celeborn-jvm-dashboard.json`. ### Why are the changes needed? Dropwizard metrics has released v4.2.25 including some bugfixes and improvements including: * [JVM] Fix maximum/total memory calculation: dropwizard/metrics#3125 * [Thread] Add peak and total started thread count to `ThreadStatesGaugeSet`: dropwizard/metrics#1601 Meanwhile, Ratis version has upgraded to 3.0.1 which has no compatibility problem with Dropwizard 4.2.25. Backport: - apache/spark#26332 - apache/spark#29426 - apache/spark#37372 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes #2540 from SteNicholas/CELEBORN-1389. Authored-by: SteNicholas <[email protected]> Signed-off-by: mingji <[email protected]> (cherry picked from commit 4fc42d7) Signed-off-by: mingji <[email protected]>
What changes were proposed in this pull request?
Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x.
Why are the changes needed?
This helps JDK 9+ support, per for example dropwizard/metrics#1236
Does this PR introduce any user-facing change?
No, although downstream users with custom metrics may be affected.
How was this patch tested?
Existing tests.