Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Add branchless filling #550

Merged
merged 4 commits into from
Apr 5, 2022
Merged

zstd: Add branchless filling #550

merged 4 commits into from
Apr 5, 2022

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Apr 4, 2022

Do fills without random branching.

Avoids 1 fill, since we can guarantee 56 bits.

Add specialized version for when we only need at most 56 bits per loop to only have one fill. ~9% faster when hit.

λ benchcmp before.txt after2.txt
benchmark                                                                                      old ns/op     new ns/op     delta
Benchmark_seqdec_decode/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        89698         71085         -20.75%
Benchmark_seqdec_decode/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       90916         64945         -28.57%
Benchmark_seqdec_decode/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               79227         85803         +8.30%
Benchmark_seqdec_decode/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        8632          8509          -1.42%
Benchmark_seqdec_decode/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        21501         21002         -2.32%
Benchmark_seqdec_decode/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         59676         49136         -17.66%
Benchmark_seqdec_decode/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  5503          6079          +10.47%
Benchmark_seqdec_decode/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     116600        94914         -18.60%
Benchmark_seqdec_decode/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   45.2          51.2          +13.40%
Benchmark_seqdec_decode/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    537           526           -2.14%
Benchmark_seqdec_decode/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                5280          6082          +15.19%
Benchmark_seqdec_decode/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   17068         17798         +4.28%
Benchmark_seqdec_decode/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              27212         24872         -8.60%
Benchmark_seqdec_decode/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    58125         47181         -18.83%

Better, but not too clear a win.

Do fills without random branching.

Avoids 1 fill, since we can guarantee 56 bits.

```
λ benchcmp before.txt after.txt
benchmark                                                                                      old ns/op     new ns/op     delta
Benchmark_seqdec_decode/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        88467         69674         -21.24%
Benchmark_seqdec_decode/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       89402         72286         -19.14%
Benchmark_seqdec_decode/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               80374         88479         +10.08%
Benchmark_seqdec_decode/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        8823          9632          +9.17%
Benchmark_seqdec_decode/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        22177         21437         -3.34%
Benchmark_seqdec_decode/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         60001         49542         -17.43%
Benchmark_seqdec_decode/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  5676          6190          +9.06%
Benchmark_seqdec_decode/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     116433        96040         -17.51%
Benchmark_seqdec_decode/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   45.4          45.6          +0.40%
Benchmark_seqdec_decode/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    536           582           +8.59%
Benchmark_seqdec_decode/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                5307          5830          +9.85%
Benchmark_seqdec_decode/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   17262         17753         +2.84%
Benchmark_seqdec_decode/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              26685         25027         -6.21%
Benchmark_seqdec_decode/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    57524         46995         -18.30%
```

Better, but not too clear a win.
# Conflicts:
#	zstd/seqdec_amd64.s
@klauspost klauspost merged commit 8c016bf into master Apr 5, 2022
@klauspost klauspost deleted the branchless-fills branch April 5, 2022 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant