Optimization for cache.parsePath to reduce allocations
#238
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this? (check all applicable)
Description
This adds preallocation of the
pathsandpartsslices, and eliminates the usage ofstrings.Splitin(*cache).parsePathin order to reduce the allocations.strings.Splitis replaced with a custom iterator type that usesstrings.Cutin order to have zero-allocations done for the iteration of the path. Despitego1.24introducing a similar replacement instrings.SplitSeq, this cannot be used because of the requirement to advance the iteration when parsing a sliceOfStruct. Aniter.Pulliterator could also serve this purpose, but the performance would be be less improved because it allocates several objects to set up the coroutines. Regardless, either of those options would require updating the go version.The benchmark in the package shows a speed up and allocation reduction, and a benchmark of my own workload with a lot of slice-of-struct usage shows a larger improvement. This benchmark has been added to the package.
BenchmarkAll results for
main:BenchmarkAll results for this branch:
Looking at a flamegraph of


parsePathfromBenchmarkAll, before and after:In my personal workload the improvement is much greater. In my workload I re-use the Decoder, use far fewer structs, and have tens of input elements per Decode call. I have added the benchmark
BenchmarkSliceOfStructto represent this workload.BenchmarkSliceOfStruct for main:
BenchmarkSliceOfStruct for this branch:
and of course, the before flamegraph:


and after:
as can be seen on this second flamegraph, there is definitely still an opportunity for more wins here because the largest is now spent making the preallocated slices. A larger refactor can be done to properly count the necessary sizes of the
pathsandpartsslices, instead of my estimation using the number of".".This would require that we repeat the first operation of the loop, and count exactly how many paths and
sliceOfStructfields there are. And truth be told, as a first time contributor, I don't feel comfortable making a larger refactor like that 😆. So, I settled for the choice to possibly over-allocate, and clip the slices to reduce the footprint outside ofparsePath.Related Tickets & Documents
Added/updated tests?
have not been included
Run verifications and test
make verifyis passinggolangci-lintis passing with ago1.20supporting version, although I could not get a version working forgosecandgovulncheck. However,make verifywith the latest versions of go and the linters fails with issues unrelated to this change. The lints are: https://staticcheck.dev/docs/checks/#QF1008 and gosec G115 (integer conversions)make testis passing