Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Improve "best" compression #677

Merged
merged 2 commits into from
Sep 25, 2022

Conversation

nightwolfz
Copy link
Contributor

@nightwolfz nightwolfz commented Sep 25, 2022

benchstat -delta-test none old2.txt new2.txt
name                                  old time/op    new time/op     delta
Encoder_EncodeAllXML-16                 13.3ms ± 0%     13.0ms ± 0%    -2.11%
Encoder_EncodeAllSimple/fastest-16       229µs ± 0%      227µs ± 0%    -1.08%
Encoder_EncodeAllSimple/default-16       343µs ± 0%      371µs ± 0%    +8.13%
Encoder_EncodeAllSimple/better-16        402µs ± 0%      393µs ± 0%    -2.33%
Encoder_EncodeAllSimple/best-16         6.41ms ± 0%     2.72ms ± 0%   -57.48%   <====
Encoder_EncodeAllSimple4K/fastest-16    2.70µs ± 0%     2.56µs ± 0%    -5.26%
Encoder_EncodeAllSimple4K/default-16    33.1µs ± 0%     33.5µs ± 0%    +1.30%
Encoder_EncodeAllSimple4K/better-16     39.3µs ± 0%     38.8µs ± 0%    -1.12%
Encoder_EncodeAllSimple4K/best-16        732µs ± 0%      360µs ± 0%   -50.90%   <====
Encoder_EncodeAllHTML-16                 213µs ± 0%      209µs ± 0%    -2.07%
Encoder_EncodeAllTwain-16               3.23ms ± 0%     3.23ms ± 0%    -0.04%
Encoder_EncodeAllPi-16                  1.12ms ± 0%     1.11ms ± 0%    -1.01%
Random4KEncodeAllFastest-16              988ns ± 0%      976ns ± 0%    -1.31%
Random10MBEncodeAllFastest-16           2.50ms ± 0%     2.48ms ± 0%    -0.70%
Random4KEncodeAllDefault-16             4.58µs ± 0%     4.56µs ± 0%    -0.31%
RandomEncodeAllDefault-16               2.58ms ± 0%     2.52ms ± 0%    -2.20%
Random10MBEncoderFastest-16             3.61ms ± 0%     3.61ms ± 0%    -0.04%
RandomEncoderDefault-16                 3.44ms ± 0%     3.44ms ± 0%    +0.03%

name                                  old speed      new speed       delta
Encoder_EncodeAllXML-16                402MB/s ± 0%    410MB/s ± 0%    +2.16%
Encoder_EncodeAllSimple/fastest-16     173MB/s ± 0%    175MB/s ± 0%    +1.10%
Encoder_EncodeAllSimple/default-16     116MB/s ± 0%    107MB/s ± 0%    -7.52%
Encoder_EncodeAllSimple/better-16     99.0MB/s ± 0%  101.4MB/s ± 0%    +2.38%
Encoder_EncodeAllSimple/best-16       6.21MB/s ± 0%  14.61MB/s ± 0%  +135.27%  <====
Encoder_EncodeAllSimple4K/fastest-16  1.52GB/s ± 0%   1.60GB/s ± 0%    +5.56%
Encoder_EncodeAllSimple4K/default-16   124MB/s ± 0%    122MB/s ± 0%    -1.29%
Encoder_EncodeAllSimple4K/better-16    104MB/s ± 0%    106MB/s ± 0%    +1.13%
Encoder_EncodeAllSimple4K/best-16     5.59MB/s ± 0%  11.39MB/s ± 0%  +103.76%  <====
Encoder_EncodeAllHTML-16               208MB/s ± 0%    213MB/s ± 0%    +2.11%
Encoder_EncodeAllTwain-16              120MB/s ± 0%    120MB/s ± 0%    +0.04%
Encoder_EncodeAllPi-16                89.0MB/s ± 0%   89.9MB/s ± 0%    +1.02%
Random4KEncodeAllFastest-16           4.14GB/s ± 0%   4.20GB/s ± 0%    +1.32%
Random10MBEncodeAllFastest-16         4.19GB/s ± 0%   4.22GB/s ± 0%    +0.71%
Random4KEncodeAllDefault-16            895MB/s ± 0%    897MB/s ± 0%    +0.31%
RandomEncodeAllDefault-16             4.06GB/s ± 0%   4.15GB/s ± 0%    +2.25%
Random10MBEncoderFastest-16           2.90GB/s ± 0%   2.90GB/s ± 0%    +0.04%
RandomEncoderDefault-16               3.05GB/s ± 0%   3.05GB/s ± 0%    -0.03%

For information, I have also tried aligning other structs one by one but nothing else had any measurable effect.

```
benchstat -delta-test none old2.txt new2.txt
name                                  old time/op    new time/op     delta
Encoder_EncodeAllXML-16                 13.3ms ± 0%     13.0ms ± 0%    -2.11%
Encoder_EncodeAllSimple/fastest-16       229µs ± 0%      227µs ± 0%    -1.08%
Encoder_EncodeAllSimple/default-16       343µs ± 0%      371µs ± 0%    +8.13%
Encoder_EncodeAllSimple/better-16        402µs ± 0%      393µs ± 0%    -2.33%
Encoder_EncodeAllSimple/best-16         6.41ms ± 0%     2.72ms ± 0%   -57.48%  <====
Encoder_EncodeAllSimple4K/fastest-16    2.70µs ± 0%     2.56µs ± 0%    -5.26%
Encoder_EncodeAllSimple4K/default-16    33.1µs ± 0%     33.5µs ± 0%    +1.30%
Encoder_EncodeAllSimple4K/better-16     39.3µs ± 0%     38.8µs ± 0%    -1.12%
Encoder_EncodeAllSimple4K/best-16        732µs ± 0%      360µs ± 0%   -50.90%   <====
Encoder_EncodeAllHTML-16                 213µs ± 0%      209µs ± 0%    -2.07%
Encoder_EncodeAllTwain-16               3.23ms ± 0%     3.23ms ± 0%    -0.04%
Encoder_EncodeAllPi-16                  1.12ms ± 0%     1.11ms ± 0%    -1.01%
Random4KEncodeAllFastest-16              988ns ± 0%      976ns ± 0%    -1.31%
Random10MBEncodeAllFastest-16           2.50ms ± 0%     2.48ms ± 0%    -0.70%
Random4KEncodeAllDefault-16             4.58µs ± 0%     4.56µs ± 0%    -0.31%
RandomEncodeAllDefault-16               2.58ms ± 0%     2.52ms ± 0%    -2.20%
Random10MBEncoderFastest-16             3.61ms ± 0%     3.61ms ± 0%    -0.04%
RandomEncoderDefault-16                 3.44ms ± 0%     3.44ms ± 0%    +0.03%

name                                  old speed      new speed       delta
Encoder_EncodeAllXML-16                402MB/s ± 0%    410MB/s ± 0%    +2.16%
Encoder_EncodeAllSimple/fastest-16     173MB/s ± 0%    175MB/s ± 0%    +1.10%
Encoder_EncodeAllSimple/default-16     116MB/s ± 0%    107MB/s ± 0%    -7.52%
Encoder_EncodeAllSimple/better-16     99.0MB/s ± 0%  101.4MB/s ± 0%    +2.38%
Encoder_EncodeAllSimple/best-16       6.21MB/s ± 0%  14.61MB/s ± 0%  +135.27%  <====
Encoder_EncodeAllSimple4K/fastest-16  1.52GB/s ± 0%   1.60GB/s ± 0%    +5.56%
Encoder_EncodeAllSimple4K/default-16   124MB/s ± 0%    122MB/s ± 0%    -1.29%
Encoder_EncodeAllSimple4K/better-16    104MB/s ± 0%    106MB/s ± 0%    +1.13%
Encoder_EncodeAllSimple4K/best-16     5.59MB/s ± 0%  11.39MB/s ± 0%  +103.76%  <====
Encoder_EncodeAllHTML-16               208MB/s ± 0%    213MB/s ± 0%    +2.11%
Encoder_EncodeAllTwain-16              120MB/s ± 0%    120MB/s ± 0%    +0.04%
Encoder_EncodeAllPi-16                89.0MB/s ± 0%   89.9MB/s ± 0%    +1.02%
Random4KEncodeAllFastest-16           4.14GB/s ± 0%   4.20GB/s ± 0%    +1.32%
Random10MBEncodeAllFastest-16         4.19GB/s ± 0%   4.22GB/s ± 0%    +0.71%
Random4KEncodeAllDefault-16            895MB/s ± 0%    897MB/s ± 0%    +0.31%
RandomEncodeAllDefault-16             4.06GB/s ± 0%   4.15GB/s ± 0%    +2.25%
Random10MBEncoderFastest-16           2.90GB/s ± 0%   2.90GB/s ± 0%    +0.04%
RandomEncoderDefault-16               3.05GB/s ± 0%   3.05GB/s ± 0%    -0.03%
```
@nightwolfz nightwolfz changed the title [zstd/enc] Cache align struct for big perf boost zstd: Cache align struct for big perf boost Sep 25, 2022
@nightwolfz nightwolfz changed the title zstd: Cache align struct for big perf boost zstd: Improve "best" compression Sep 25, 2022
@klauspost
Copy link
Owner

Very nice! Could you add a comment explaining how you determined the size?

That way if the struct is changed whoever is looking at it will know how to adjust it.

@nightwolfz
Copy link
Contributor Author

@klauspost All done :)

@klauspost klauspost merged commit 3822c7c into klauspost:master Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants