-
Notifications
You must be signed in to change notification settings - Fork 38.8k
Avoid double toString call when joining collections #27557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Commit 8d3e8ca optimized how the method `cleanPath` normalizes its input path and introduced the method `joinStrings` which pre-allocates a StringBuilder of the correct size to hold the final delimited string. Commit cc026fc removed this method again and merged it with `collectionToDelimitedString`. The merged implementation however now undoes part of the original optimization: `toString()` is called twice on each item of the collection, which will put quite some pressure on the GC for non-string collections. For string collections this is basically a no-op because a string simply returns itself, i.e. `this`, in its `toString` implementation. This commit introduces a specialization for string collections again. The actual collecting of items into the StringBuilder has now been extracted and the call sites pass a (potentially pre-allocated) StringBuilder instance.
|
Hello @knittl, could you clarify how you identified this as a performance problem? With this change and an adapted JMH benchmark, I've found that:
I'm asking this because we tend to only consider optimizations that fit our own usage pattern, especially when it comes with downsides for the main usage pattern or additional maintenance work on our side. |
|
@bclozel sorry for coming back to you only now. TBH, I haven't, I was just hugely annoyed by double-tostring to precompute the stringbuilder size, which seems to negate the initial improvements. The initial code change (#26316) made sure that the public behavior did not change for generic collections and copied a specialization of the method for string-collections. Unfortunately Java has type erasure so it is not possible to define a method of the same name which accepts a Another idea that I followed before I came up with the current fix is to always copy the collection to a new string collection (equivalent stream expression: The initial attempt (#26316) identified the |
|
No worries @knittl I understand that the change that got merged doesn't entirely solve the performance problem you've found. Now I still don't know if the issue you've encountered comes from a code path within Spring or if the usage pattern comes from an application. Could you tell us more? In the first case, we should ensure that this issue is properly addressed one way or another. In the second, we can't really optimize utility methods for a general usage pattern, especially if this comes at a cost for the common use case. As you've found out, we can't really optimize for both here. |
|
@bclozel the initial performance problem was encountered while profiling a Spring (Boot) application during startup. A considerable amount of CPU time is spent in the Optimizing for both cases (non-string collections with custom, non-cached See also the JavaDoc of the private method
This new PR (#27557) restores the original (i.e. before #26316) behavior/performance of method The commit cc026fc unfortunately added overhead for the non-string case (due to the double-toString calls). This overhead was not there before. In a way, that commit has made performance at least twice as bad for non-string collections while optimizing only for the string collection case. Do you know if this method is called more often with strings or non-strings from external code? Perhaps a new method could be introduced (e.g. To summarize: this PR aims to restore the original (before Aug 30 2021) performance characteristics of this method, while still providing optimizations when called from Thanks for your input so far, I am looking forward to hearing your opinion. |
|
Sorry but I still don't understand the current situation.
With those two points I can work on JMH benchmarks and get a clearer picture. Thanks! |
|
@bclozel I will find time to run a CPU profile again next week or the week after. Regarding your two points:
I will get back to you with a CPU profile in 1 or 2 weeks time. Thanks for your questions |
|
Sorry @knittl but I'm going to close this PR for the following reason:
Spring Framework In this case, I think you should consider using a custom method that better fits your use case. |
|
@bclozel I accept that, but I don't understand your point about "decreasing the performance of the main one". Performance of To repeat: I don't have an active use case which calls this method. It's just that the signature of this method is confusing as it allows collections of any element type to be passed, while intentionally providing an inefficient implementation for non-string inputs (that is: worse performance since Aug 2021). Maybe this method should have only accepted Lots of other applications directly use Spring's StringUtils in a similar manner to Apache Commons. A quick GitHub search turns up 33k calls to Thanks for sharing your insights! :) |
|
Sorry you're right, this decision is not about "decreasing the performance of the main one", but rather choosing not to add utility code or optimize for use cases we're not responsible of.
That wasn't clear to me in the original PR nor this one; knowing that would have been a good reason in itself to cut this discussion short. There are many places that could use performance optimizations in Spring Framework, we'd rather focus on the ones that have a meaningful impact on most Spring applications. Thanks! |
Commit 8d3e8ca optimized how the method
cleanPathnormalizes its input path and introduced the methodjoinStringswhich pre-allocates a StringBuilder of the correct sizeto hold the final delimited string.
Commit cc026fc removed this method
again and merged it with
collectionToDelimitedString. The mergedimplementation however now undoes part of the original optimization:
toString()is called twice on each item of the collection, which willput quite some pressure on the GC for non-string collections. For string
collections this is basically a no-op because a string simply returns
itself, i.e.
this, in itstoStringimplementation.This commit introduces a specialization for string collections again.
The actual collecting of items into the StringBuilder has now been
extracted and the call sites pass a (potentially pre-allocated)
StringBuilder instance.
This is a followup to PR #26316