You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
oyvindln
changed the title
Investigate whether we can avoid a 4k lookuptable without reducing performance
Investigate whether we can avoid a 4k lookup table in init_tree without reducing performance
May 17, 2024
If anyone is interested in working on this, fdeflate uses a different strategy based on libdeflate. It keeps the codeword in bit-reversed form, and uses more complicated arithmetic to increment it to the next value.
oyvindln
changed the title
Investigate whether we can avoid a 4k lookup table in init_tree without reducing performance
Investigate whether we can avoid a 2k lookup table in init_tree without reducing performance
Dec 9, 2024
Yeah it might be worth looking into. Already tinkering a bit with some other tweaks to the huffman tree stuff in inflate.
Should note it's now a 2K table instead of 4k as I found that using 512-size one and bit reverse on larger numbers seemed to not hurt performance.
Also I haven't pushed it yet but also found that it could be avoided entirely on aarch64 and loongarch (and technically armv7 and newer 32-bit arm but didn't find an easy way to differentiate between different arm feature levels at compile time with just the standard lib) with no issue as those architectures have a bit reverse instruction and at least at least when testing on my aarch64 raspberry pi 3B it didn't seem to be any slower to use that.
As noted in this PR, this lookup table used to increase performance is very large - maybe there is a better way of doing this?
#152
The text was updated successfully, but these errors were encountered: