Increase performance and reduce allocations in `hclsyntax.LexConfig` #490

inkel · 2021-10-20T13:43:00Z

Hello, team. First of all, thank you for sharing this library.

While working on an internal project that makes use of this library to parse .hcl files to retrieve a list of used modules, I added some benchmarks and found that one of the bottlenecks that I had was in the hclsyntax.LexConfig. Further inspection highlighted checkInvalidTokens as one of the culprits (the other being emitToken which is in a generated file and thus I didn't want to mess much with).

I took the liberty to add a benchmark for LexConfig with a rather small HCL file, but that serves the purposes of finding if I was able to improve the results. Then I proceeded to run the benchmarks in the original code and found this results (only showing one line for brevity):

$ go test -benchmem -bench=. -benchtime=200000x -count=10 -memprofile=mem.pprof
goos: darwin
goarch: amd64
pkg: github.com/hashicorp/hcl/v2/hclsyntax
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLexConfig-12             200000              9298 ns/op            8936 B/op         37 allocs/op

After that I've started to find where most of the allocations were being made, and found a spot that was making most of the reported allocations. Then I proceeded to try and reduce the allocations count and bytes, and the last commit got me this benchmark results (again showing only one line for brevity):

$ go test -benchmem -bench=. -benchtime=200000x -count=10 -memprofile=mem.pprof
goos: darwin
goarch: amd64
pkg: github.com/hashicorp/hcl/v2/hclsyntax
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLexConfig-12             200000              7799 ns/op            6056 B/op          7 allocs/op

According to benchstat, the difference between both benchmarks is:

name          old time/op    new time/op    delta
LexConfig-12    9.27µs ± 0%    7.83µs ± 1%  -15.55%  (p=0.000 n=9+10)

name          old alloc/op   new alloc/op   delta
LexConfig-12    8.94kB ± 0%    6.06kB ± 0%  -32.23%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
LexConfig-12      37.0 ± 0%       7.0 ± 0%  -81.08%  (p=0.000 n=10+10)

As you can see this reduced the number of allocations in ~80% using 32% less bytes while doing so. As a side effect, performance got increased in 15%.

I hope you find this useful and worth of merging, thanks!

hashicorp-cla · 2021-10-20T13:43:04Z

All committers have signed the CLA.

inkel · 2022-02-08T21:48:36Z

Hi! Any updates on this?

apparentlymart · 2022-02-15T18:52:29Z

Hi @inkel! Sorry for not responding sooner.

I had to do some tooling maintenance on this repository to get the per-commit checks working again before review, and that's now done and so we do have the tests running here again.

At the time I'm writing this there seems to be a test failure but my initial investigation suggests that it's a pre-existing issue unrelated to what you proposed here: there is a test that's currently relying on a particular iteration order for maps in Go, but Go does not guarantee that and so there's a 50% chance of the test failing.

I'd like to try to get that fixed first because the code you modified here was introduced specifically to fix an earlier bug around accidental reuse of values, and so although what you implemented does look correct to me as a human reviewer (the closure you introduced seems to allocate a new copy of the range for each use) I'd like to confirm that confidence by seeing the tests run cleanly before we move forward.

Thanks for working on this!

inkel · 2022-02-15T19:23:49Z

Hey @apparentlymart thanks for taking the time to review this!

What you say makes total sense to me. I'll be happy to help with anything if you need to.

Doing this reduces the memory used in ~11%, as the following benchstat comparison shows: name old time/op new time/op delta LexConfig-12 9.27µs ± 0% 9.03µs ± 1% -2.55% (p=0.000 n=9+10) name old alloc/op new alloc/op delta LexConfig-12 8.94kB ± 0% 7.98kB ± 0% -10.74% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LexConfig-12 37.0 ± 0% 37.0 ± 0% ~ (all equal) Benchmarks were created using: go test -benchmem -benchtime=200000x -count=10 -bench=.

In a similar fashion as the parent commit, here instead of always copying the tok.Range for later use, we define a function to get this copied value, and thus we only allocate the copy if it's needed, otherwise don't. For the benchmark introduced earlier, the reduction in allocations and memory usage is outstanding: name old time/op new time/op delta LexConfig-12 9.05µs ± 1% 7.83µs ± 1% -13.54% (p=0.000 n=10+10) name old alloc/op new alloc/op delta LexConfig-12 7.98kB ± 0% 6.06kB ± 0% -24.07% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LexConfig-12 37.0 ± 0% 7.0 ± 0% -81.08% (p=0.000 n=10+10) Benchmarks were created using: go test -benchmem -benchtime=200000x -count=10 -bench=.

apparentlymart · 2022-02-16T19:10:55Z

Hi again, @inkel!

Thanks for your patience. With the automated test runs now working we can see the existing tests passing, and for good measure I also tried some manual tests to convince myself that indeed this approach does result in each diagnostic pointing to a separate hcl.Range object, without sharing allocations between the loops.

This does mean that now if there are multiple diagnostics in a single iteration we will allocate a new hcl.Range for each, but that seems like a good tradeoff since we typically don't optimize the error return paths on the assumption that reasonable usage will produce only a few diagnostics at most.

Thanks again for working on this! I'm going to merge it now.

inkel · 2022-02-16T19:13:39Z

@apparentlymart thank you for merging this and for the detailed explanation!

inkel force-pushed the inkel/reduce-mem-usage branch from 2856001 to c4f9e5b Compare February 9, 2022 14:52

apparentlymart force-pushed the inkel/reduce-mem-usage branch 2 times, most recently from c31e897 to 818a4cf Compare February 15, 2022 18:38

apparentlymart force-pushed the inkel/reduce-mem-usage branch from 818a4cf to a2faa08 Compare February 15, 2022 21:54

apparentlymart added syntax/native v2 Relates to the v2 line of releases labels Feb 16, 2022

inkel added 3 commits February 16, 2022 11:03

hclsyntax: Add benchmark for LexConfig

7beec5f

apparentlymart force-pushed the inkel/reduce-mem-usage branch from a2faa08 to 3d38be3 Compare February 16, 2022 19:05

apparentlymart merged commit 895479c into hashicorp:main Feb 16, 2022

inkel deleted the inkel/reduce-mem-usage branch February 16, 2022 19:11

crazy-max mentioned this pull request Oct 19, 2023

vendor: update hcl dependencies docker/buildx#2088

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase performance and reduce allocations in `hclsyntax.LexConfig` #490

Increase performance and reduce allocations in `hclsyntax.LexConfig` #490

inkel commented Oct 20, 2021

hashicorp-cla commented Oct 20, 2021 •

edited

Loading

inkel commented Feb 8, 2022

apparentlymart commented Feb 15, 2022 •

edited

Loading

inkel commented Feb 15, 2022

apparentlymart commented Feb 16, 2022

inkel commented Feb 16, 2022

Increase performance and reduce allocations in hclsyntax.LexConfig #490

Increase performance and reduce allocations in hclsyntax.LexConfig #490

Conversation

inkel commented Oct 20, 2021

hashicorp-cla commented Oct 20, 2021 • edited Loading

inkel commented Feb 8, 2022

apparentlymart commented Feb 15, 2022 • edited Loading

inkel commented Feb 15, 2022

apparentlymart commented Feb 16, 2022

inkel commented Feb 16, 2022

Increase performance and reduce allocations in `hclsyntax.LexConfig` #490

Increase performance and reduce allocations in `hclsyntax.LexConfig` #490

hashicorp-cla commented Oct 20, 2021 •

edited

Loading

apparentlymart commented Feb 15, 2022 •

edited

Loading