-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Copy literal in 16 byte blocks when possible #592
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Also reduces literal overalloc when full allocs are allowed. ``` benchmark old ns/op new ns/op delta BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 14572 13898 -4.63% BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 3946 3682 -6.69% BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 45150 43296 -4.11% BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 33525 36679 +9.41% BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 11952 10496 -12.18% BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 14081 13339 -5.27% BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 12111 11745 -3.02% BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 1073 1037 -3.36% BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 1759 1841 +4.66% BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 43722 39755 -9.07% BenchmarkDecoder_DecodeAllParallel/html.zst-32 4144 3756 -9.36% BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 1240 1240 +0.00% BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32 250426 240012 -4.16% BenchmarkDecoder_DecodeAll/geo.protodata.zst-32 71861 65548 -8.79% BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32 829878 736934 -11.20% BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32 609402 683505 +12.16% BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32 231636 189146 -18.34% BenchmarkDecoder_DecodeAll/alice29.txt.zst-32 245022 226451 -7.58% BenchmarkDecoder_DecodeAll/html_x_4.zst-32 229709 216421 -5.78% BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32 18400 17850 -2.99% BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32 9682 9801 +1.23% BenchmarkDecoder_DecodeAll/urls.10K.zst-32 924472 796913 -13.80% BenchmarkDecoder_DecodeAll/html.zst-32 77728 66831 -14.02% BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32 7985 7432 -6.93% Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32 130498 106559 -18.34% Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32 136475 121699 -10.83% Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32 43119 33598 -22.08% Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32 15723 14472 -7.96% Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32 25968 19734 -24.01% Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32 88906 79506 -10.57% Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32 7385 7269 -1.57% Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32 83133 64295 -22.66% Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32 2899 2881 -0.62% Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32 3951 3961 +0.25% Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32 7063 6809 -3.60% Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32 14045 14050 +0.04% Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32 19679 18611 -5.43% Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32 48841 45545 -6.75% Benchmark_seqdec_decodeSync/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32 276464 273620 -1.03% Benchmark_seqdec_decodeSync/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32 270905 269049 -0.69% Benchmark_seqdec_decodeSync/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32 146061 145878 -0.13% Benchmark_seqdec_decodeSync/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32 30686 27367 -10.82% Benchmark_seqdec_decodeSync/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32 88493 87167 -1.50% Benchmark_seqdec_decodeSync/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32 195326 195764 +0.22% Benchmark_seqdec_decodeSync/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32 14081 13925 -1.11% Benchmark_seqdec_decodeSync/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32 297178 298192 +0.34% Benchmark_seqdec_decodeSync/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32 2935 2921 -0.48% Benchmark_seqdec_decodeSync/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32 4856 4467 -8.01% Benchmark_seqdec_decodeSync/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32 14059 14050 -0.06% Benchmark_seqdec_decodeSync/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32 35636 33427 -6.20% Benchmark_seqdec_decodeSync/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32 88618 85660 -3.34% Benchmark_seqdec_decodeSync/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32 162282 160568 -1.06% ``` `lcet10.txt` doesn't like it, otherwise mostly positive. Streams before/after: ``` BenchmarkDecoderEnwik9-32 1 1288277200 ns/op 776.23 MB/s 59552 B/op 44 allocs/op BenchmarkDecoderEnwik9/multithreaded-writer-32 1 1191034000 ns/op 839.61 MB/s 13993224 B/op 113 allocs/op BenchmarkDecoderSilesia-32 5 209913160 ns/op 1009.69 MB/s 46715 B/op 38 allocs/op BenchmarkDecoderSilesia/multithreaded-writer-32 5 201394480 ns/op 1052.40 MB/s 5129462 B/op 77 allocs/op ```
Great improvement! |
WojciechMula
approved these changes
May 12, 2022
A funny fact is that I wanted to pick that problem and planned to ask you where to start. :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Also reduces literal overalloc when full allocs are allowed.
lcet10.txt
doesn't like it, otherwise mostly positive.Streams before/after: