-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: translate fseDecoder.buildDtable into asm #598
zstd: translate fseDecoder.buildDtable into asm #598
Conversation
07789a8
to
f8322cd
Compare
I paused development but finally, get back to this translation. (Some tests still don't pass and for now, I can't figure out why.) |
@WojciechMula Great to have you back! What is the failure? |
@klauspost A funny thing - when reviewing diff on github minutes ago I immediately spotted a mistake. :) I put wrong condition in loop (JLE rather JL). |
@WojciechMula Having some time away from code will often do that :) |
f8322cd
to
c2f7314
Compare
I have many years of experience, but this property of our brain still amazes me. :) |
MOVWQZX(ptr, nextState) | ||
|
||
// symbolNext[symbol] = nextState + 1 | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe simple INCQ(ptr)
won't be bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick benchmarks showed more regressions, but maybe it's just noise.
zstd/_generate/gen_fse.go
Outdated
} | ||
|
||
// newState := (nextState << nBits) - tableSize | ||
newState := GP64() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Copy64
zstd/_generate/gen_fse.go
Outdated
SHLQ(reg.CL, newState) | ||
SUBQ(b.tableSize, newState) | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove, some remnants from debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Speedup is what you can expect, since overall usage is so low.
Regression from #598 causing excessive heap allocations.
Regression from #598 causing excessive heap allocations.
In our tests,
buildDtable
takes 6-7% of the total time, so seems it's worth trying to make it a bit faster.Below are the results from a Skylake machine (my IceLake AWS instance is temporarily unavailable). No significantly faster, but overall a faster and just a few regressions.