model: add Mellum architecture#23966
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
Undo the formatting changes please. :) Also fill in the AI disclosure in OP. |
|
Sigh, old Edit: Scratch that, just remove |
Yep, I'll do that. I checked |
Also in |
|
383 tokens/sec generation on RTX 6000 Pro using btw I replicated the conversion to GGUF f16 => Q8_0 nice work @Xarbirus |
|
Failing CIs are not relevant, GTG. |
* model: support for Mellum architecture * model: improve mellum.py formatting * model: improve mellum.py formatting once again * deps: downgrade transformers to 4.57.6 (to fix CI) * deps: remove huggingface_hub dependency * deps: remove huggingface_hub from test requirements --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
Does this PR add support for Mellum2's MTP that is described in the Mellum2 paper? |
|
@coder543 Not yet, will need to implement it in transformers first. But it's on the roadmap |
|
Can anyone report success using Tools with Mellum2 and llama.cpp? Firstly; we need to apply the official Jinja template to get tools calling at all. But even then, 'Failed to parse' failures are frequent e.g:
The model is claimed as having been trained for Agentic operations; the high failure rate ~60% is therefore unexpected. Update: Setting My full |
|
Thanks @Xarbirus 💌 Works great with "Mellum2-12B-A2.5B-Thinking-Q6_K.gguf" on a RTX 3060 12Go with opencode.ai :
|
@Cyrille37 How is the tool calling for you? That's problematic for me, using the same named model file. Tools work but unusually high failure rate. |
I only tried with opencode : tools like |
I'm also using OpenCode, are you willing to share your llama.cpp launch command for comparison? Mine's above. |
llama-server compiled with CUDA 12.9 capability 86 |
|
@chris-hatton we've uploaded a GGUF collection yesterday for which defaults should work for both tools and reasoning. But please let us know if the issue persists! llama-server -hf JetBrains/Mellum2-12B-A2.5B-Thinking-GGUF-Q8_0 --tools all
|

Overview
This PR adds support for the new Mellum architecture (see hf).
Additional information
transformersversion has been updated in this PR. This is because the converter does not work without the fix for one bug.Requirements