-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase performance and reduce allocations in hclsyntax.LexConfig
#490
Conversation
Hi! Any updates on this? |
2856001
to
c4f9e5b
Compare
c31e897
to
818a4cf
Compare
Hi @inkel! Sorry for not responding sooner. I had to do some tooling maintenance on this repository to get the per-commit checks working again before review, and that's now done and so we do have the tests running here again. At the time I'm writing this there seems to be a test failure but my initial investigation suggests that it's a pre-existing issue unrelated to what you proposed here: there is a test that's currently relying on a particular iteration order for maps in Go, but Go does not guarantee that and so there's a 50% chance of the test failing. I'd like to try to get that fixed first because the code you modified here was introduced specifically to fix an earlier bug around accidental reuse of values, and so although what you implemented does look correct to me as a human reviewer (the closure you introduced seems to allocate a new copy of the range for each use) I'd like to confirm that confidence by seeing the tests run cleanly before we move forward. Thanks for working on this! |
Hey @apparentlymart thanks for taking the time to review this! What you say makes total sense to me. I'll be happy to help with anything if you need to. |
818a4cf
to
a2faa08
Compare
Doing this reduces the memory used in ~11%, as the following benchstat comparison shows: name old time/op new time/op delta LexConfig-12 9.27µs ± 0% 9.03µs ± 1% -2.55% (p=0.000 n=9+10) name old alloc/op new alloc/op delta LexConfig-12 8.94kB ± 0% 7.98kB ± 0% -10.74% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LexConfig-12 37.0 ± 0% 37.0 ± 0% ~ (all equal) Benchmarks were created using: go test -benchmem -benchtime=200000x -count=10 -bench=.
In a similar fashion as the parent commit, here instead of always copying the tok.Range for later use, we define a function to get this copied value, and thus we only allocate the copy if it's needed, otherwise don't. For the benchmark introduced earlier, the reduction in allocations and memory usage is outstanding: name old time/op new time/op delta LexConfig-12 9.05µs ± 1% 7.83µs ± 1% -13.54% (p=0.000 n=10+10) name old alloc/op new alloc/op delta LexConfig-12 7.98kB ± 0% 6.06kB ± 0% -24.07% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LexConfig-12 37.0 ± 0% 7.0 ± 0% -81.08% (p=0.000 n=10+10) Benchmarks were created using: go test -benchmem -benchtime=200000x -count=10 -bench=.
a2faa08
to
3d38be3
Compare
Hi again, @inkel! Thanks for your patience. With the automated test runs now working we can see the existing tests passing, and for good measure I also tried some manual tests to convince myself that indeed this approach does result in each diagnostic pointing to a separate This does mean that now if there are multiple diagnostics in a single iteration we will allocate a new Thanks again for working on this! I'm going to merge it now. |
@apparentlymart thank you for merging this and for the detailed explanation! |
Hello, team. First of all, thank you for sharing this library.
While working on an internal project that makes use of this library to parse
.hcl
files to retrieve a list of used modules, I added some benchmarks and found that one of the bottlenecks that I had was in thehclsyntax.LexConfig
. Further inspection highlightedcheckInvalidTokens
as one of the culprits (the other beingemitToken
which is in a generated file and thus I didn't want to mess much with).I took the liberty to add a benchmark for
LexConfig
with a rather small HCL file, but that serves the purposes of finding if I was able to improve the results. Then I proceeded to run the benchmarks in the original code and found this results (only showing one line for brevity):After that I've started to find where most of the allocations were being made, and found a spot that was making most of the reported allocations. Then I proceeded to try and reduce the allocations count and bytes, and the last commit got me this benchmark results (again showing only one line for brevity):
According to
benchstat
, the difference between both benchmarks is:As you can see this reduced the number of allocations in ~80% using 32% less bytes while doing so. As a side effect, performance got increased in 15%.
I hope you find this useful and worth of merging, thanks!