Skip to content

Support models with merged up/gate experts#1408

Merged
ikawrakow merged 5 commits intomainfrom
ik/pre_merged_up_gate
Mar 12, 2026
Merged

Support models with merged up/gate experts#1408
ikawrakow merged 5 commits intomainfrom
ik/pre_merged_up_gate

Conversation

@ikawrakow
Copy link
Owner

OK, my first reaction was not to support mainline nonsense, but it will be a losing battle, so here it is the support for models now popping up on HF with merged ffn_up_exps and ffn_gate_exps.

Obviously mainline developers had to do it the other way around compared to what pre-existed in ik_llama.cpp: they did put the ffn_gate_exps data first, while in ik_llama.cpp the ffn_up_exps data was first. Hence, I had to also change how ik_llama.cpp merges on-the-fly with the -muge command line option, so the PR became bigger than it should have been.

I think it works, but is getting late here, so I'll check more tomorrow. In the meantime, you can try as well.

Important

These pre-merged models will not work with split mode graph, so don't download them if you have 2 or more GPUs. I may or may not decide to add split mode graph support for merged up/gate experts.

Haha, mainline has elected to arrange the merged tensors
the other way around compared to what I had done in the on-the-fly merge.
@ikawrakow ikawrakow merged commit 5713d3b into main Mar 12, 2026
ikawrakow added a commit that referenced this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant