-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: optimize small values encoding #344
Conversation
... using virtual table 'shards' and avoiding costly copy()
Interesting. I would never expect randomly branching code to be faster than a straight up memcopy, not even considering the extra writes to update the tables. |
@tony2001 I rewrote it slightly for #345 - mainly so regular encodes didn't have to allocate the extra space and waste time writing to the table. Added it to "better" as well. As I also noted in the PR this would also allow to not copy the history when starting a new encode, though that will require some rewriting of the matching code. |
@tony2001 If it is possible for you to test out the other PR, and it otherwise checks out, I will merge it. |
Sure, I'll test it a bit later |
All credit goes to @tony2001 As shown in #344 the speed of small dictionary compression tasks (< 32K) can be improved significantly by keeping track of the state of the hash table. This effectively implements #344 but avoids a penalty for non-dictionary encodes and extends the functionality to the "better" compression mode as well. This change will also make it easier to [remove the copy of the literal dictionary](https://github.com/klauspost/compress/blob/a0dc84a8cf242dde7e21f8aba26126ca4621ff8c/zstd/enc_base.go#L171) every time an encode starts and have specialized code to deal with this. ``` benchmark old ns/op new ns/op delta BenchmarkEncodeAllDict0_1024/length-19-level-fastest-dict-1-32 5729 870 -84.82% BenchmarkEncodeAllDict0_1024/length-19-level-default-dict-1-32 59694 2115 -96.46% BenchmarkEncodeAllDict0_1024/length-19-level-better-dict-1-32 197183 2454 -98.76% BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1-32 5596 600 -89.28% BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1-32 59342 1222 -97.94% BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1-32 194466 1958 -98.99% BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1-32 13343 13132 -1.58% BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1-32 72651 33988 -53.22% BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1-32 211509 22635 -89.30% BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1-32 12190 10318 -15.36% BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1-32 71443 28580 -60.00% BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1-32 213304 17914 -91.60% BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#01-32 5582 595 -89.33% BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#01-32 58721 1221 -97.92% BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#01-32 196875 1963 -99.00% BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#01-32 13260 13132 -0.97% BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#01-32 71944 33896 -52.89% BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#01-32 207200 22533 -89.12% BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#01-32 12218 10295 -15.74% BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#01-32 69490 28531 -58.94% BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#01-32 205039 18020 -91.21% BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#02-32 5579 599 -89.26% BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#02-32 60810 1228 -97.98% BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#02-32 198740 1953 -99.02% BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#02-32 13352 13128 -1.68% BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#02-32 72544 33887 -53.29% BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#02-32 213331 22516 -89.45% BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#02-32 12204 10299 -15.61% BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#02-32 69317 28421 -59.00% BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#02-32 207613 17917 -91.37% BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#03-32 5542 600 -89.17% BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#03-32 59132 1218 -97.94% BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#03-32 196451 1952 -99.01% BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#03-32 13319 13112 -1.55% BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#03-32 70234 33843 -51.81% BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#03-32 209384 22447 -89.28% BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#03-32 12285 10297 -16.18% BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#03-32 71972 28585 -60.28% BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#03-32 215483 17902 -91.69% BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1-32 16508 16221 -1.74% BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1-32 83569 41344 -50.53% BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1-32 220306 39384 -82.12% BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1-32 41125 40975 -0.36% BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1-32 163203 77122 -52.74% BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1-32 318789 137116 -56.99% BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#01-32 16586 16294 -1.76% BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#01-32 82607 41120 -50.22% BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#01-32 219278 39179 -82.13% BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#01-32 42267 41093 -2.78% BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#01-32 164353 76905 -53.21% BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#01-32 327857 136501 -58.37% BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#02-32 16554 16177 -2.28% BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#02-32 83337 41239 -50.52% BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#02-32 226392 39385 -82.60% BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#02-32 41175 40834 -0.83% BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#02-32 160614 77318 -51.86% BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#02-32 313359 136739 -56.36% BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#03-32 16413 16274 -0.85% BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#03-32 81907 41151 -49.76% BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#03-32 222585 39181 -82.40% BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#03-32 41232 40978 -0.62% BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#03-32 159086 77235 -51.45% BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#03-32 309822 136600 -55.91% BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1-32 55120 55056 -0.12% BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1-32 291966 132353 -54.67% BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1-32 467914 206802 -55.80% BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#01-32 53770 54785 +1.89% BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#01-32 291053 130230 -55.26% BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#01-32 476829 205292 -56.95% BenchmarkEncodeAllDict8192_16384/length-9024-level-fastest-dict-1-32 31805 31891 +0.27% BenchmarkEncodeAllDict8192_16384/length-9024-level-default-dict-1-32 116904 61027 -47.80% BenchmarkEncodeAllDict8192_16384/length-9024-level-better-dict-1-32 260057 95128 -63.42% BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#02-32 54833 54341 -0.90% BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#02-32 291523 131595 -54.86% BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#02-32 467178 206408 -55.82% BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#03-32 54431 54289 -0.26% BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#03-32 291092 130441 -55.19% BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#03-32 476490 205606 -56.85% BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1-32 245211 243965 -0.51% BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1-32 817566 822310 +0.58% BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1-32 1258889 590281 -53.11% BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#01-32 242203 241662 -0.22% BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#01-32 812895 818005 +0.63% BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#01-32 1265187 590826 -53.30% BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#02-32 242602 241849 -0.31% BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#02-32 828540 819250 -1.12% BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#02-32 1286233 586918 -54.37% BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#03-32 245593 244559 -0.42% BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#03-32 813931 819203 +0.65% BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#03-32 1272813 581714 -54.30% BenchmarkEncodeAllDict16384_65536/length-20000-level-fastest-dict-1-32 18972 18733 -1.26% BenchmarkEncodeAllDict16384_65536/length-20000-level-default-dict-1-32 75984 39850 -47.55% BenchmarkEncodeAllDict16384_65536/length-20000-level-better-dict-1-32 213173 27825 -86.95% BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1-32 1070089 1055243 -1.39% BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1-32 1780011 1819554 +2.22% BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1-32 2785437 1631976 -41.41% BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1-32 500568 499781 -0.16% BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1-32 1036024 1076927 +3.95% BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1-32 1740181 859317 -50.62% BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1-32 410671 405122 -1.35% BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1-32 1025429 1025611 +0.02% BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1-32 1584230 739134 -53.34% BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1#01-32 1054258 1048012 -0.59% BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1#01-32 1756825 1810346 +3.05% BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1#01-32 2816869 1659755 -41.08% BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#01-32 498201 500382 +0.44% BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#01-32 1045296 1075033 +2.84% BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#01-32 1772563 855280 -51.75% BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#01-32 411487 404032 -1.81% BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#01-32 1009682 1023147 +1.33% BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#01-32 1588776 728182 -54.17% BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#02-32 501487 498564 -0.58% BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#02-32 1037744 1074253 +3.52% BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#02-32 1753509 859959 -50.96% BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#02-32 407233 403579 -0.90% BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#02-32 1013906 1026835 +1.28% BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#02-32 1591512 731027 -54.07% BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#03-32 500983 495842 -1.03% BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#03-32 1046435 1075070 +2.74% BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#03-32 1760434 860257 -51.13% BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#03-32 409099 405108 -0.98% BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#03-32 1011372 1021036 +0.96% BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#03-32 1572944 731780 -53.48% benchmark old MB/s new MB/s speedup BenchmarkEncodeAllDict0_1024/length-19-level-fastest-dict-1-32 3.32 21.84 6.58x BenchmarkEncodeAllDict0_1024/length-19-level-default-dict-1-32 0.32 8.98 28.06x BenchmarkEncodeAllDict0_1024/length-19-level-better-dict-1-32 0.10 7.74 77.40x BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1-32 0.89 8.34 9.37x BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1-32 0.08 4.09 51.12x BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1-32 0.03 2.55 85.00x BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1-32 49.39 50.18 1.02x BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1-32 9.07 19.39 2.14x BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1-32 3.12 29.11 9.33x BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1-32 14.27 16.86 1.18x BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1-32 2.44 6.09 2.50x BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1-32 0.82 9.71 11.84x BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#01-32 0.90 8.40 9.33x BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#01-32 0.09 4.10 45.56x BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#01-32 0.03 2.55 85.00x BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#01-32 49.70 50.18 1.01x BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#01-32 9.16 19.44 2.12x BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#01-32 3.18 29.25 9.20x BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#01-32 14.24 16.90 1.19x BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#01-32 2.50 6.10 2.44x BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#01-32 0.85 9.66 11.36x BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#02-32 0.90 8.35 9.28x BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#02-32 0.08 4.07 50.88x BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#02-32 0.03 2.56 85.33x BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#02-32 49.36 50.20 1.02x BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#02-32 9.08 19.45 2.14x BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#02-32 3.09 29.27 9.47x BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#02-32 14.26 16.90 1.19x BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#02-32 2.51 6.12 2.44x BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#02-32 0.84 9.71 11.56x BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#03-32 0.90 8.33 9.26x BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#03-32 0.08 4.11 51.38x BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#03-32 0.03 2.56 85.33x BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#03-32 49.48 50.26 1.02x BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#03-32 9.38 19.47 2.08x BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#03-32 3.15 29.36 9.32x BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#03-32 14.16 16.90 1.19x BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#03-32 2.42 6.09 2.52x BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#03-32 0.81 9.72 12.00x BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1-32 65.18 66.33 1.02x BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1-32 12.88 26.03 2.02x BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1-32 4.88 27.32 5.60x BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1-32 142.78 143.31 1.00x BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1-32 35.98 76.14 2.12x BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1-32 18.42 42.82 2.32x BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#01-32 64.88 66.04 1.02x BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#01-32 13.03 26.17 2.01x BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#01-32 4.91 27.46 5.59x BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#01-32 138.93 142.90 1.03x BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#01-32 35.73 76.35 2.14x BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#01-32 17.91 43.02 2.40x BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#02-32 65.00 66.51 1.02x BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#02-32 12.91 26.09 2.02x BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#02-32 4.75 27.32 5.75x BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#02-32 142.61 143.80 1.01x BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#02-32 36.56 75.95 2.08x BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#02-32 18.74 42.94 2.29x BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#03-32 65.56 66.12 1.01x BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#03-32 13.14 26.15 1.99x BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#03-32 4.83 27.46 5.69x BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#03-32 142.41 143.30 1.01x BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#03-32 36.91 76.03 2.06x BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#03-32 18.95 42.99 2.27x BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1-32 220.08 220.34 1.00x BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1-32 41.55 91.66 2.21x BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1-32 25.93 58.66 2.26x BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#01-32 225.61 221.43 0.98x BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#01-32 41.68 93.15 2.23x BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#01-32 25.44 59.09 2.32x BenchmarkEncodeAllDict8192_16384/length-9024-level-fastest-dict-1-32 283.73 282.97 1.00x BenchmarkEncodeAllDict8192_16384/length-9024-level-default-dict-1-32 77.19 147.87 1.92x BenchmarkEncodeAllDict8192_16384/length-9024-level-better-dict-1-32 34.70 94.86 2.73x BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#02-32 221.23 223.24 1.01x BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#02-32 41.61 92.18 2.22x BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#02-32 25.97 58.77 2.26x BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#03-32 222.87 223.45 1.00x BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#03-32 41.67 93.00 2.23x BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#03-32 25.46 59.00 2.32x BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1-32 243.44 244.69 1.01x BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1-32 73.02 72.59 0.99x BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1-32 47.42 101.13 2.13x BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#01-32 246.47 247.02 1.00x BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#01-32 73.44 72.98 0.99x BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#01-32 47.18 101.04 2.14x BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#02-32 246.06 246.83 1.00x BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#02-32 72.05 72.87 1.01x BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#02-32 46.41 101.71 2.19x BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#03-32 243.06 244.09 1.00x BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#03-32 73.34 72.87 0.99x BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#03-32 46.90 102.62 2.19x BenchmarkEncodeAllDict16384_65536/length-20000-level-fastest-dict-1-32 1054.19 1067.64 1.01x BenchmarkEncodeAllDict16384_65536/length-20000-level-default-dict-1-32 263.21 501.88 1.91x BenchmarkEncodeAllDict16384_65536/length-20000-level-better-dict-1-32 93.82 718.77 7.66x BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1-32 196.78 199.55 1.01x BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1-32 118.30 115.73 0.98x BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1-32 75.60 129.03 1.71x BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1-32 204.98 205.30 1.00x BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1-32 99.04 95.28 0.96x BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1-32 58.96 119.40 2.03x BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1-32 165.61 167.88 1.01x BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1-32 66.33 66.31 1.00x BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1-32 42.93 92.02 2.14x BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1#01-32 199.73 200.92 1.01x BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1#01-32 119.86 116.31 0.97x BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1#01-32 74.75 126.87 1.70x BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#01-32 205.95 205.05 1.00x BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#01-32 98.16 95.44 0.97x BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#01-32 57.89 119.97 2.07x BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#01-32 165.29 168.34 1.02x BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#01-32 67.36 66.47 0.99x BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#01-32 42.81 93.40 2.18x BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#02-32 204.60 205.80 1.01x BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#02-32 98.87 95.51 0.97x BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#02-32 58.51 119.31 2.04x BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#02-32 167.01 168.52 1.01x BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#02-32 67.08 66.24 0.99x BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#02-32 42.73 93.04 2.18x BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#03-32 204.81 206.93 1.01x BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#03-32 98.05 95.44 0.97x BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#03-32 58.28 119.27 2.05x BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#03-32 166.25 167.89 1.01x BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#03-32 67.25 66.61 0.99x BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#03-32 43.24 92.94 2.15x ```
Replaced by #345 |
This is not a pull request, more of a discuss request :)
We have a piece of software that accepts data packages of around 1k to 10k bytes and stores them for some time. In order to save some RAM we of course compress them. Current implementation uses Snappy for compression, but we thought Zstd with a dictionary would be a nice replacement - the incoming data packages mostly contain repeating words.
I tried replacing Snappy with Zstd Go implementation (EncodeAll() + a custom dict), but the overall performance was mediocre, even though compression level was very impressive indeed.
After a short investigation I found that most of the time is spent in copy()'ing dictionary table to the one used by the encoder. I'm not really proficient in compression algorithms, but from what I saw in the C sources, it's just a part of Zstd design, no way to get rid of it completely. In the same time there is no much sense in copying 32K table (for fast, it's 128K for default) because some of its elements were changed. Quite obviously this copy() call takes more time than the compression of a 1k package itself.
Hence the following patch: it introduces "virtual shards" in the table and marks the updated ones as dirty, so that they could be re-initialized with clean dictionary data later.
This is just a proof of concept, I don't like the modified code myself, it's quite dumb, there has to be a cleaner way of doing it, but it seems that each "cleaner" version I tried affects performance in a very noticeable way, so let's just start with this one.
I also modified the benchmark suite a bit in order to make it more convenient (and shorter): divided the benchmarks into several categories by data size.
Apparently, this improvement mainly for small data packages, bigger data takes more time to compress and the table shards will be mostly 'dirty', so there's no way to avoid copy()'ing all the table after all.
Let me know what you think about it.
See some benchmarks below.
Data size less than 1024 bytes:
Data size from 1024 to 8192 bytes:
For some reason I can even see some speed up on larger data:
There seem to be some performance gain even with data larger than 16K, but with 64K and more there's no difference at all.
No idea why "better" shows small degradation or improvement in some cases, my modifications shouldn't have affected it any way.