Optimization for `cache.parsePath` to reduce allocations #238

brian-gavin · 2025-08-07T00:30:56Z

What type of PR is this? (check all applicable)

Description

This adds preallocation of the paths and parts slices, and eliminates the usage of strings.Split in (*cache).parsePath in order to reduce the allocations. strings.Split is replaced with a custom iterator type that uses strings.Cut in order to have zero-allocations done for the iteration of the path. Despite go1.24 introducing a similar replacement in strings.SplitSeq, this cannot be used because of the requirement to advance the iteration when parsing a sliceOfStruct. An iter.Pull iterator could also serve this purpose, but the performance would be be less improved because it allocates several objects to set up the coroutines. Regardless, either of those options would require updating the go version.

The benchmark in the package shows a speed up and allocation reduction, and a benchmark of my own workload with a lot of slice-of-struct usage shows a larger improvement. This benchmark has been added to the package.

BenchmarkAll results for main:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAll-16    	   21609	     55065 ns/op	   18390 B/op	     379 allocs/op
PASS
ok  	github.com/gorilla/schema	2.157s

BenchmarkAll results for this branch:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAll-16    	   23038	     52747 ns/op	   16906 B/op	     332 allocs/op
PASS
ok  	github.com/gorilla/schema	2.135s

Looking at a flamegraph of parsePath from BenchmarkAll, before and after:

In my personal workload the improvement is much greater. In my workload I re-use the Decoder, use far fewer structs, and have tens of input elements per Decode call. I have added the benchmark BenchmarkSliceOfStruct to represent this workload.

BenchmarkSliceOfStruct for main:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceOfStruct-16    	   74332	     15562 ns/op	    5792 B/op	     176 allocs/op
PASS
ok  	github.com/gorilla/schema	1.932s

BenchmarkSliceOfStruct for this branch:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceOfStruct-16    	   83041	     13373 ns/op	    4140 B/op	     124 allocs/op
PASS
ok  	github.com/gorilla/schema	1.807s

and of course, the before flamegraph:

and after:

as can be seen on this second flamegraph, there is definitely still an opportunity for more wins here because the largest is now spent making the preallocated slices. A larger refactor can be done to properly count the necessary sizes of the paths and parts slices, instead of my estimation using the number of ".".

This would require that we repeat the first operation of the loop, and count exactly how many paths and sliceOfStruct fields there are. And truth be told, as a first time contributor, I don't feel comfortable making a larger refactor like that 😆. So, I settled for the choice to possibly over-allocate, and clip the slices to reduce the footprint outside of parsePath.

Related Tickets & Documents

Related Issue #
Closes #

Added/updated tests?

Yes
No, and this is why: please replace this line with details on why tests
have not been included
I need help with writing tests

Run verifications and test

make verify is passing
- golangci-lint is passing with a go1.20 supporting version, although I could not get a version working for gosec and govulncheck. However, make verify with the latest versions of go and the linters fails with issues unrelated to this change. The lints are: https://staticcheck.dev/docs/checks/#QF1008 and gosec G115 (integer conversions)
make test is passing

… using 1 path allocation

…ngs.Cut

brian-gavin added 4 commits August 5, 2025 14:12

small win by simple preallocating the path and parts slices, and only…

f154b7c

… using 1 path allocation

get rid of strings.Split usage using a custom iter type based on stri…

00d5c5a

…ngs.Cut

fix go 1.20 compat (with notes of helpers in later versions)

14f5b21

add sliceofstruct workload benchmark

4a4fe33

pull-request-size bot added the size/L label Aug 7, 2025

brian-gavin force-pushed the bg/optimize-cache-parse-path branch from 50fa6c3 to d626cd3 Compare August 8, 2025 05:12

fix edge cases for pathIter, test for pathIter

7d12f2a

brian-gavin force-pushed the bg/optimize-cache-parse-path branch from d626cd3 to 7d12f2a Compare August 8, 2025 05:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization for `cache.parsePath` to reduce allocations #238

Optimization for `cache.parsePath` to reduce allocations #238

Uh oh!

brian-gavin commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimization for cache.parsePath to reduce allocations #238

Are you sure you want to change the base?

Optimization for cache.parsePath to reduce allocations #238

Uh oh!

Conversation

brian-gavin commented Aug 7, 2025

What type of PR is this? (check all applicable)

Description

Related Tickets & Documents

Added/updated tests?

Run verifications and test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimization for `cache.parsePath` to reduce allocations #238

Optimization for `cache.parsePath` to reduce allocations #238