make the fallback implementation of slidehash vectorize better #296

folkertdev · 2025-02-20T14:48:42Z

I'm picking 16 because on SSE the loop is just unrolled, and on other targets with 256-bit wide registers they could actually use the full width.

The table is always a size that is a power of 2, either 1 << 16 for state.head or between 1 << 8 and 1 << 15 for state.prev (it's based on the window size). So it's totally legit to use chunks here, and there is no risk of ignoring elements.

codecov · 2025-02-20T14:50:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
zlib-rs/src/deflate/slide_hash.rs	`99.14% <100.00%> (+0.03%)`	⬆️

folkertdev · 2025-02-20T15:06:38Z

On neon our custom version appears to generate better code (hard to judge, but it's fewer instructions in the hot loop)

https://godbolt.org/z/sd3Me8fnG

zlib-rs/src/deflate/slide_hash.rs

folkertdev · 2025-02-20T16:03:09Z

turns out a chunk size of 32 generates less code because that appears to be, on aarch64 and x86_64, how much LLVM wants to unroll (probably based on the number of concurrent loads that the CPU can do).

folkertdev requested a review from bjorn3 February 20, 2025 14:54

bjorn3 reviewed Feb 20, 2025

View reviewed changes

zlib-rs/src/deflate/slide_hash.rs Outdated Show resolved Hide resolved

bjorn3 approved these changes Feb 20, 2025

View reviewed changes

make the fallback implementation of slidehash vectorize better

5d795ec

folkertdev force-pushed the vectorize-slide-hash branch from 1894353 to 5d795ec Compare February 20, 2025 16:00

folkertdev merged commit e53c1a6 into main Feb 20, 2025
20 checks passed

folkertdev deleted the vectorize-slide-hash branch February 20, 2025 16:08

BrewTestBot mentioned this pull request Apr 1, 2025

zlib-rs 0.5.0 Homebrew/homebrew-core#217552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

make the fallback implementation of slidehash vectorize better #296

make the fallback implementation of slidehash vectorize better #296

Uh oh!

folkertdev commented Feb 20, 2025

Uh oh!

codecov bot commented Feb 20, 2025 •

edited

Loading

Uh oh!

folkertdev commented Feb 20, 2025

Uh oh!

Uh oh!

folkertdev commented Feb 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

make the fallback implementation of slidehash vectorize better #296

make the fallback implementation of slidehash vectorize better #296

Uh oh!

Conversation

folkertdev commented Feb 20, 2025

Uh oh!

codecov bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

folkertdev commented Feb 20, 2025

Uh oh!

Uh oh!

folkertdev commented Feb 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 20, 2025 •

edited

Loading