New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

iq2_tn: slightly faster PP on Zen4 #43

Merged

ikawrakow merged 1 commit into main from ik/iq2_tn_faster_pp

Sep 8, 2024

Owner

ikawrakow commented Sep 8, 2024 •

edited

Loading

With this change we get PP512 = 494 t/s (using flash attention), up from 468 t/s (~5% improvement) running on a Ryzen-7950X CPU.

Compared to the initial IQ2_TN PR #13 the cumulative improvement is 15%.

Compared to TQ2_0 in llama.cpp, which has now been merged, we are now 80% faster.


          iq2_tn: slightly faster PP

b7f7eed

ikawrakow merged commit bf4b19b into main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet