Skip to content

Conversation

@brian-gavin
Copy link

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Go Version Update
  • Dependency Update

Description

This adds preallocation of the paths and parts slices, and eliminates the usage of strings.Split in (*cache).parsePath in order to reduce the allocations. strings.Split is replaced with a custom iterator type that uses strings.Cut in order to have zero-allocations done for the iteration of the path. Despite go1.24 introducing a similar replacement in strings.SplitSeq, this cannot be used because of the requirement to advance the iteration when parsing a sliceOfStruct. An iter.Pull iterator could also serve this purpose, but the performance would be be less improved because it allocates several objects to set up the coroutines. Regardless, either of those options would require updating the go version.

The benchmark in the package shows a speed up and allocation reduction, and a benchmark of my own workload with a lot of slice-of-struct usage shows a larger improvement. This benchmark has been added to the package.

BenchmarkAll results for main:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAll-16    	   21609	     55065 ns/op	   18390 B/op	     379 allocs/op
PASS
ok  	github.com/gorilla/schema	2.157s

BenchmarkAll results for this branch:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAll-16    	   23038	     52747 ns/op	   16906 B/op	     332 allocs/op
PASS
ok  	github.com/gorilla/schema	2.135s

Looking at a flamegraph of parsePath from BenchmarkAll, before and after:
image
image

In my personal workload the improvement is much greater. In my workload I re-use the Decoder, use far fewer structs, and have tens of input elements per Decode call. I have added the benchmark BenchmarkSliceOfStruct to represent this workload.

BenchmarkSliceOfStruct for main:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceOfStruct-16    	   74332	     15562 ns/op	    5792 B/op	     176 allocs/op
PASS
ok  	github.com/gorilla/schema	1.932s

BenchmarkSliceOfStruct for this branch:

goos: darwin
goarch: amd64
pkg: github.com/gorilla/schema
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceOfStruct-16    	   83041	     13373 ns/op	    4140 B/op	     124 allocs/op
PASS
ok  	github.com/gorilla/schema	1.807s

and of course, the before flamegraph:
image
and after:
image

as can be seen on this second flamegraph, there is definitely still an opportunity for more wins here because the largest is now spent making the preallocated slices. A larger refactor can be done to properly count the necessary sizes of the paths and parts slices, instead of my estimation using the number of ".".

This would require that we repeat the first operation of the loop, and count exactly how many paths and sliceOfStruct fields there are. And truth be told, as a first time contributor, I don't feel comfortable making a larger refactor like that 😆. So, I settled for the choice to possibly over-allocate, and clip the slices to reduce the footprint outside of parsePath.

Related Tickets & Documents

  • Related Issue #
  • Closes #

Added/updated tests?

  • Yes
  • No, and this is why: please replace this line with details on why tests
    have not been included
  • I need help with writing tests

Run verifications and test

  • make verify is passing
    • golangci-lint is passing with a go1.20 supporting version, although I could not get a version working for gosec and govulncheck. However, make verify with the latest versions of go and the linters fails with issues unrelated to this change. The lints are: https://staticcheck.dev/docs/checks/#QF1008 and gosec G115 (integer conversions)
  • make test is passing

@brian-gavin brian-gavin force-pushed the bg/optimize-cache-parse-path branch from 50fa6c3 to d626cd3 Compare August 8, 2025 05:12
@brian-gavin brian-gavin force-pushed the bg/optimize-cache-parse-path branch from d626cd3 to 7d12f2a Compare August 8, 2025 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant