Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: faster next state update in BMI2 version of decode #593

Merged
merged 1 commit into from
May 12, 2022

Conversation

WojciechMula
Copy link
Contributor

Use the Go-code approach: use single getBits to obtain three bitfields.

Results from an Ice Lake machine. Not big improvement, but still something.

benchmark                                                                 old ns/op     new ns/op     delta
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            3538269       3464588       -2.08%
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        767535        762036        -0.72%
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         12020124      11753140      -2.22%
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           9370142       9150443       -2.34%
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         2888949       2847418       -1.44%
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          3611946       3523666       -2.44%
BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             1431293       1416360       -1.04%
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       196917        191536        -2.73%
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       127251        125210        -1.60%
BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             11395615      11235287      -1.41%
BenchmarkDecoder_DecoderSmall/html.zst-16                                 944829        924662        -2.13%
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        64173         64119         -0.08%
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               282174        280770        -0.50%
BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           66167         66359         +0.29%
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            922002        932301        +1.12%
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              683173        684177        +0.15%
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            247548        248853        +0.53%
BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             274773        274472        -0.11%
BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                246462        246202        -0.11%
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          20594         20605         +0.05%
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          11307         11306         -0.01%
BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                1012849       1003953       -0.88%
BenchmarkDecoder_DecodeAll/html.zst-16                                    71019         71430         +0.58%
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           8022          7990          -0.40%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      881685        881563        -0.01%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      745784        753593        +1.05%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       763830        769607        +0.76%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         723014        726033        +0.42%
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9196          9197          +0.01%
BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          249892        246681        -1.28%
BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           214009        213535        -0.22%
BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             148169        147211        -0.65%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              3679          3675          -0.11%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              3151          3142          -0.29%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               4038          4041          +0.07%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 11088         11087         -0.01%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 4295          4267          -0.65%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 6941          6938          -0.04%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  6944          6947          +0.04%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    6846          6814          -0.47%
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       49690         49631         -0.12%
BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       50346         50250         -0.19%
BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        48131         47985         -0.30%
BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          50142         50017         -0.25%
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9196          9195          -0.01%
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         248131        245482        -1.07%
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          212266        211689        -0.27%
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            147942        147473        -0.32%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    28247         28132         -0.41%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    33284         33307         +0.07%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     25149         25136         -0.05%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       34645         34499         -0.42%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9195          9192          -0.03%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9190          9190          +0.00%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9193          9200          +0.08%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9198          9192          -0.07%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     97922         96367         -1.59%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     94087         92098         -2.11%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      89399         87983         -1.58%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        90856         89094         -1.94%
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         1040          1049          +0.87%
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         27730         27366         -1.31%
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          23469         23280         -0.81%
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            16864         16860         -0.02%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             520           449           -13.68%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             558           429           -23.12%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              490           460           -6.00%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                866           859           -0.82%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                570           567           -0.47%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                708           710           +0.35%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 706           705           -0.09%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   734           733           -0.12%
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      7398          7315          -1.12%
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      7520          7428          -1.22%
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       7188          7109          -1.10%
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         7542          7490          -0.69%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        1058          1042          -1.51%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        28040         27632         -1.46%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         23286         23142         -0.62%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           16856         16860         +0.02%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   3629          3614          -0.41%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   3667          3651          -0.44%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    2892          2880          -0.41%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      3840          3833          -0.18%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    1038          1042          +0.39%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    1039          1043          +0.38%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     1036          1033          -0.29%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       1041          1035          -0.58%
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       37367         36741         -1.68%
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   9301          9226          -0.81%
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    121253        118642        -2.15%
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      91335         89550         -1.95%
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    31263         30758         -1.62%
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     39573         38752         -2.07%
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        31527         31476         -0.16%
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  2747          2740          -0.25%
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  1267          1254          -1.03%
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        109008        107497        -1.39%
BenchmarkDecoder_DecodeAllParallel/html.zst-16                            9900          9803          -0.98%
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   1090          1084          -0.55%

benchmark                                                                 old MB/s     new MB/s     speedup
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            416.75       425.61       1.02x
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        1236.04      1244.96      1.01x
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         320.70       327.99       1.02x
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           364.35       373.10       1.02x
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         346.64       351.70       1.01x
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          336.86       345.30       1.03x
BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             2289.40      2313.54      1.01x
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       4160.13      4277.00      1.03x
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       7738.58      7864.74      1.02x
BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             492.88       499.92       1.01x
BenchmarkDecoder_DecoderSmall/html.zst-16                                 867.04       885.95       1.02x
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        508.13       508.56       1.00x
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               653.21       656.48       1.01x
BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           1792.24      1787.07      1.00x
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            522.62       516.85       0.99x
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              624.66       623.75       1.00x
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            505.68       503.02       0.99x
BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             553.51       554.12       1.00x
BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                1661.92      1663.68      1.00x
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          4972.43      4969.68      1.00x
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          10886.31     10887.11     1.00x
BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                693.18       699.32       1.01x
BenchmarkDecoder_DecodeAll/html.zst-16                                    1441.86      1433.57      0.99x
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           508.09       510.11       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      440.03       440.09       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      520.21       514.82       0.99x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       507.92       504.11       0.99x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         536.59       534.36       1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          10875.00     10872.87     1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          400.18       405.39       1.01x
BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           467.28       468.32       1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             674.92       679.32       1.01x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              1118.87      1119.99      1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              1306.44      1309.88      1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               1019.38      1018.48      1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 371.21       371.24       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 360.39       362.77       1.01x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 223.01       223.11       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  222.94       222.84       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    226.10       227.19       1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       895.09       896.16       1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       883.42       885.12       1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        924.08       926.89       1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          887.02       889.25       1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         10874.41     10875.76     1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         403.03       407.37       1.01x
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          471.12       472.41       1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            675.96       678.11       1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    1812.57      1819.96      1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    1538.27      1537.22      1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     2035.91      2036.92      1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       1477.86      1484.12      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     10876.32     10879.64     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     10882.29     10881.82     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      10878.19     10869.60     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        10872.67     10879.41     1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     3961.96      4025.90      1.02x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     4123.46      4212.50      1.02x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      4339.69      4409.56      1.02x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        4270.11      4354.55      1.02x
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         96150.10     95315.78     0.99x
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         3606.34      3654.30      1.01x
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          4261.03      4295.69      1.01x
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            5930.07      5931.46      1.00x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             7918.33      9173.05      1.16x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             7381.36      9601.93      1.30x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              8401.98      8938.65      1.06x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                4750.45      4789.99      1.01x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                2715.61      2728.85      1.00x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                2187.89      2180.39      1.00x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 2193.74      2195.81      1.00x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   2108.53      2111.09      1.00x
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      6012.03      6080.29      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      5914.58      5987.87      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       6187.51      6256.08      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         5897.13      5937.99      1.01x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        94559.93     95962.77     1.01x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        3566.45      3619.13      1.01x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         4294.64      4321.36      1.01x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           5932.61      5931.45      1.00x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   14107.13     14169.07     1.00x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   13964.23     14025.04     1.00x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    17703.73     17776.93     1.00x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      13333.43     13358.75     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    96306.48     95976.00     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    96238.15     95896.84     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     96558.62     96776.02     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       96102.84     96635.70     1.01x
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       4932.70      5016.74      1.02x
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   12750.37     12853.04     1.01x
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    3974.00      4061.49      1.02x
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      4672.40      4765.53      1.02x
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    4004.09      4069.75      1.02x
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     3843.28      3924.71      1.02x
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        12992.11     13013.04     1.00x
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  37276.90     37370.52     1.00x
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  97177.32     98180.77     1.01x
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        6440.67      6531.22      1.01x
BenchmarkDecoder_DecodeAllParallel/html.zst-16                            10343.44     10446.00     1.01x
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   3737.84      3760.76      1.01x

zstd/_generate/gen.go Outdated Show resolved Hide resolved
zstd/_generate/gen.go Outdated Show resolved Hide resolved
zstd/_generate/gen.go Outdated Show resolved Hide resolved
Use the Go-code approach: use single getBits to obtain three bitfields.
@klauspost klauspost merged commit 348514c into klauspost:master May 12, 2022
@klauspost klauspost deleted the zstd-change-update-state branch May 12, 2022 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants