-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage of method handle invocations in Jetty 10 #6328
Comments
@SerCeMan we do see the tiny green towers in a quick load test that we have written to try to reproduce this issue. We would like to know:
From our point of view, we create one Let us know the result of your findings. Meanwhile we will investigate as well. |
@SerCeMan @sbordet I have done some testing and understand a bit more why this is happening. For each WebSocket endpoint we get MethodHandles for the relevant methods Although I did not see much time spent in the |
Do you need to do this? I am referring to the binding, because if we have 1M connections we would have 1M different |
@sbordet We don't need to bind the MethodHandle, you can invoke the original MethodHandle providing all the arguments every time. We would just need to remember the endpoint and session everywhere we invoke the MethodHandle. In the project I linked there is a benchmark comparing the bind to only using 1 MethodHandle and providing all arguments each time. Interestingly the benchmark result showed the case where we use |
Perhaps we should raise an issue on whatever software is producing the flame graph. Binding to a method handle is a normal thing to do and as @lachlan-roberts benchmark show, it is the right thing to do. |
@lachlan-roberts can you paste the benchmark report into a comment on this issue. |
Hey, folks! Sorry for the delayed response. I'll try to prepare answers later today. I'm still trying to reproduce the issue in the test environment - no luck yet, but I suspect that it might be related to Shenandoah GC that we're using. I'm still working on a test case that can reproduce it. |
Benchmark Results:
|
Hey, @lachlan-roberts! Regarding the benchmarks, please correct me if I'm wrong but it seems that it's possible to make the methodhandle stored in a
|
I can confirm that making the |
@lachlan-roberts is this still reproducible on current Jetty 10.0.x and/or 11.0.x HEAD? |
I haven't been able to reproduce it in a synthetic environment, stumbled across #6696, and now that |
The separate spikes on the flamegraph are reproducible, but I think it is really just an interaction between the profiler and MethodHandles and is unlikely to be causing any performance degradation. I am still planning to do a PR to use only final unbound MethodHandles to see if it is any better, but I haven't gotten around to doing it yet. So I would leave this issue open for this reason. There have been a number of PRs to improve performance in Jetty 10 since 10.0.3, so if you update it may be that you no longer experience this performance regression. For example PR #6635 was designed to reduce allocation of buffers for whole message aggregation and also reduce the amount of data copies. |
Hey, @lachlan-roberts and the team! Would you accept a PR that replaces a large number of method handles with a single one considering that the benchmarks above show no negative performance impact? After attempting to upgrade to 10.0.8, we still see a large amount of CPU time spent resolving method handles. |
@SerCeMan I think this could be difficult to implement, even more so if you are not already familiar with the Jetty WebSocket implementation. I will not have time to attempt this for a few weeks. Can you attach the full flamegraph file instead of just the screenshot? Also if you have some reproducer code which can reproduce this checkCustomized branch, it would be good to see that as well. |
Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
… algorithm Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
This issue has been automatically marked as stale because it has been a |
@SerCeMan What is the current status of this issue for you? Are you still seeing high CPU? Our PR to address this has significant performance impact, so it was never merged. |
Thanks, @gregw! Apologies for not providing an update sooner. The issue related to the CPU usage linked to resolving method handles (in the yellow part of the flame graph) seemed to be related to a specific combination of GC settings (ShenandoahGC) and the JVM version we were using at the time (version 13). A series of JDK upgrades, although I'm unsure of the exact version, likely the transition to version 17, managed to resolve it. There is still an issue with not being able to use async-profiler, but it's more of a nice-to-have observability featuer considering that it is theoretically possible to employ the async-profiler with some extra pre/post stack processing. |
@lachlan-roberts So it seams this is not needed so much now.... |
@gregw we did open an issue with them at some stage but they said it was the expected behaviour with method handles and flagged it as not an issue. |
This issue has been automatically marked as stale because it has been a |
This issue has been closed due to it having no activity. |
Signed-off-by: Lachlan Roberts <[email protected]>
Issue #6328 - avoid binding WebSocket MethodHandles
This issue has been closed due to it having no activity. |
Jetty version
10.0.3
Java version/vendor
(use: java -version)
OS type/version
Ubuntu 18.04.5 LTS
Description
Hi, Jetty maintainers!
We've recently attempted a migration from Jetty 9 to Jetty 10 and we've noticed a regression related to WebSockets. According to our metrics, there seems to be a memory leak which I'm still currently investigating and I hope to provide more information soon. However, it also seems that Jetty 10 spends a large amount of CPU resolving lambda forms to perform method handle invocations inside. On the flame graphs, we saw a large number of lambdaforms - every tiny green tower on the flame graph is a separate instance of LabmdaForm. My assumption which I'm currently investigating and hope to provide more data soon is that the large number of labmdaforms filled up the java heap.
CPU flame graph:
Allocation flame graph:
I was wondering if you've observed this behaviour before or might know what it could be caused by. I'm also currently investigating the issue and will provide more info once I have it. Thanks!
The text was updated successfully, but these errors were encountered: