Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate whether we can avoid a 2k lookup table in init_tree without reducing performance #153

Open
oyvindln opened this issue May 17, 2024 · 2 comments

Comments

@oyvindln
Copy link
Collaborator

As noted in this PR, this lookup table used to increase performance is very large - maybe there is a better way of doing this?

#152

@oyvindln oyvindln changed the title Investigate whether we can avoid a 4k lookuptable without reducing performance Investigate whether we can avoid a 4k lookup table in init_tree without reducing performance May 17, 2024
@fintelia
Copy link

fintelia commented Dec 8, 2024

If anyone is interested in working on this, fdeflate uses a different strategy based on libdeflate. It keeps the codeword in bit-reversed form, and uses more complicated arithmetic to increment it to the next value.

@oyvindln oyvindln changed the title Investigate whether we can avoid a 4k lookup table in init_tree without reducing performance Investigate whether we can avoid a 2k lookup table in init_tree without reducing performance Dec 9, 2024
@oyvindln
Copy link
Collaborator Author

oyvindln commented Dec 9, 2024

Yeah it might be worth looking into. Already tinkering a bit with some other tweaks to the huffman tree stuff in inflate.

Should note it's now a 2K table instead of 4k as I found that using 512-size one and bit reverse on larger numbers seemed to not hurt performance.

Also I haven't pushed it yet but also found that it could be avoided entirely on aarch64 and loongarch (and technically armv7 and newer 32-bit arm but didn't find an easy way to differentiate between different arm feature levels at compile time with just the standard lib) with no issue as those architectures have a bit reverse instruction and at least at least when testing on my aarch64 raspberry pi 3B it didn't seem to be any slower to use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants