-
Notifications
You must be signed in to change notification settings - Fork 13.5k
convert : enable expert group selection for all models with it #16691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jeffbolznv @am17an It would be great to have fusion for this as well. :) Lines 953 to 976 in 84bf3c6
|
Agreed. Let's catch up with the clamping change and get these common code utilities in, then I agree we should do this. |
|
@ggerganov gentle ping |
Kimi-K2 also has it, but it's just 1 group: "n_group": 1,
"topk_group": 1,I've patched my and it's reduced the tg by around 5-6% and pp by around 10%. |
|
My guess is that this doesn't take the topk-moe path because of this change |
Yes, effectively disabling it.
To be expected, should be recoverable by fusion. |
|
Also I didn't expect such a drastic drop in PP due to missing fusion. @jukofyork what is your setup? |
Yeah, I think I may have been wrong about this as it's been a while since I compiled a new version of |
getting a segfault that doesn't occur when I override the keys: This version of |
|
It seems that the crash occurs when The only limit I can see to do with so wonder if it is to do with |
Provided this version has the recent |
Ah, yes, that would be it. |
|
I guess we need a check and fallback to CUB. |
There is another test against llama.cpp/ggml/src/ggml-cuda/argsort.cu Line 191 in 3464bda
|
Sure, but that's |
|
I think we can swap the parameters instead of falling back to CUB. |
|
Yep, worked, making a PR. |
|
My issue was fixed by #16849. |
Enable expert group selection for all models with it (requires reconversion or metadata editing).
Specifically, these models (maybe more):