model: add Mellum architecture by Xarbirus · Pull Request #23966 · ggml-org/llama.cpp

Xarbirus · 2026-06-01T11:29:46Z

Overview

This PR adds support for the new Mellum architecture (see hf).

Additional information

It is important to note that the transformers version has been updated in this PR. This is because the converter does not work without the fix for one bug.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES for
- analysis of current implementation
- test execution and results analysis

CISC · 2026-06-01T12:54:02Z

Undo the formatting changes please. :)

Also fill in the AI disclosure in OP.

CISC · 2026-06-02T10:17:43Z

Sigh, old transformers limit huggingface_hub versions, ~~try changing requirements/requirements-tool_bench.txt and tools/server/tests/requirements.txt lower requirement to >=0.34.0~~.

Edit: Scratch that, just remove huggingface_hub altogether, not sure why it's a dependency at all?

Xarbirus · 2026-06-02T11:04:00Z

Sigh, old transformers limit huggingface_hub versions, ~~try changing requirements/requirements-tool_bench.txt and tools/server/tests/requirements.txt lower requirement to >=0.34.0~~.

Edit: Scratch that, just remove huggingface_hub altogether, not sure why it's a dependency at all?

Yep, I'll do that. I checked pyproject.toml, and everything seemed correct in there:( I'll remove the dependency from requirements.txt now.

CISC · 2026-06-02T11:08:27Z

Sigh, old transformers limit huggingface_hub versions, ~~try changing requirements/requirements-tool_bench.txt and tools/server/tests/requirements.txt lower requirement to >=0.34.0~~.
Edit: Scratch that, just remove huggingface_hub altogether, not sure why it's a dependency at all?

Yep, I'll do that. I checked pyproject.toml, and everything seemed correct in there:( I'll remove the dependency from requirements.txt now.

Also in tools/server/tests/requirements.txt.

g0t4 · 2026-06-02T11:32:39Z

383 tokens/sec generation on RTX 6000 Pro using Q8_0 from JetBrains/Mellum2-12B-A2.5B-Thinking

btw I replicated the conversion to GGUF f16 => Q8_0

nice work @Xarbirus

CISC · 2026-06-02T11:46:26Z

Failing CIs are not relevant, GTG.

* model: support for Mellum architecture * model: improve mellum.py formatting * model: improve mellum.py formatting once again * deps: downgrade transformers to 4.57.6 (to fix CI) * deps: remove huggingface_hub dependency * deps: remove huggingface_hub from test requirements --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

coder543 · 2026-06-03T14:11:31Z

Does this PR add support for Mellum2's MTP that is described in the Mellum2 paper?

pilot7747 · 2026-06-03T15:30:03Z

@coder543 Not yet, will need to implement it in transformers first. But it's on the roadmap

chris-hatton · 2026-06-04T12:46:39Z

Can anyone report success using Tools with Mellum2 and llama.cpp?

Firstly; we need to apply the official Jinja template to get tools calling at all. But even then, 'Failed to parse' failures are frequent e.g:

"Failed to parse input at pos 239: <tool_call>\n{"name": "explore", "arguments": {"path": "/", "pattern": "App.js", "depth": 2}}\n</tool_call>"

The model is claimed as having been trained for Agentic operations; the high failure rate ~60% is therefore unexpected.

Update: Setting reasoning-format deepseek improves things further but still throwing Failed to parse input fairly often

My full llama-server command:

./llama.cpp/build/bin/llama-server \
  --model ./Models/Mellum2-12B-A2.5B-Thinking-Q6_K.gguf \
  --host 0.0.0.0 \
  -fa on \
  --ctx-size 100000 \
  --n-gpu-layers 999 \
  --ubatch-size 256 \
  --threads 8 \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.0 \
  --presence-penalty 0.0 \
  --repeat-penalty 1.0 \
  --chat-template-file ./Templates/mellum2.jinja \
  --jinja \
  --reasoning-format deepseek

Cyrille37 · 2026-06-05T08:41:37Z

Thanks @Xarbirus 💌

Works great with "Mellum2-12B-A2.5B-Thinking-Q6_K.gguf" on a RTX 3060 12Go with opencode.ai :

prompt eval time = 9777.32 ms / 10596 tokens ( 0.92 ms per token, 1083.73 t/s)
eval time = 26929.82 ms / 934 tokens ( 28.83 ms per token, 34.68 t/s)
total time = 36707.14 ms / 11530 tokens

chris-hatton · 2026-06-05T09:09:17Z

Thanks @Xarbirus 💌

Works great with "Mellum2-12B-A2.5B-Thinking-Q6_K.gguf" on a RTX 3060 12Go with opencode.ai :

prompt eval time = 9777.32 ms / 10596 tokens ( 0.92 ms per token, 1083.73 t/s)

eval time = 26929.82 ms / 934 tokens ( 28.83 ms per token, 34.68 t/s)

total time = 36707.14 ms / 11530 tokens

@Cyrille37 How is the tool calling for you? That's problematic for me, using the same named model file. Tools work but unusually high failure rate.

Cyrille37 · 2026-06-05T09:17:43Z

How is the tool calling for you?

I only tried with opencode : tools like glob, grep, read, edit ... work fine.

chris-hatton · 2026-06-05T09:19:19Z

How is the tool calling for you?

I only tried with opencode : tools like glob, grep, read, edit ... work fine.

I'm also using OpenCode, are you willing to share your llama.cpp launch command for comparison? Mine's above.

Cyrille37 · 2026-06-05T09:22:49Z

llama.cpp launch command

llama-server compiled with CUDA 12.9 capability 86

llama-server -m Mellum2-12B-A2.5B-Thinking-Q6_K.gguf \
    --host 0.0.0.0 --port 8012
    --verbosity 3 \
    --threads-http 2 \
    --no-mmap \
    --flash-attn on \
    -sm row \
    --cache-type-k q8_0 --cache-type-v q8_0 \
    --repeat-penalty 1.0 --presence-penalty 0.0 \
    --temp 0.2 \
    --jinja \
    --reasoning-format deepseek \
    -c 0

Ar4l · 2026-06-05T16:40:10Z

@chris-hatton we've uploaded a GGUF collection yesterday for which defaults should work for both tools and reasoning. But please let us know if the issue persists!

llama-server -hf JetBrains/Mellum2-12B-A2.5B-Thinking-GGUF-Q8_0 --tools all

model: support for Mellum architecture

8f46149

Xarbirus requested review from CISC, JohannesGaessler and ggerganov as code owners June 1, 2026 11:29

github-actions Bot added model Model specific testing Everything test related python python script changes labels Jun 1, 2026

This comment was marked as resolved.

Sign in to view

Xarbirus added 2 commits June 1, 2026 13:38

Merge branch 'master' into mellum2

44f72cc

model: improve mellum.py formatting

2232e35

model: improve mellum.py formatting once again

f44f38c

CISC approved these changes Jun 1, 2026

View reviewed changes

Comment thread requirements/requirements-convert_legacy_llama.txt Outdated

deps: downgrade transformers to 4.57.6 (to fix CI)

a91b45f

deps: remove huggingface_hub dependency

f77578b

deps: remove huggingface_hub from test requirements

c7a95c7

github-actions Bot added examples server labels Jun 2, 2026

CISC added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 2, 2026

CISC removed testing Everything test related examples server labels Jun 2, 2026

Merge branch 'master' into mellum2

ce5fdcf

ggerganov merged commit 4fb16ec into ggml-org:master Jun 2, 2026
30 of 32 checks passed

Conversation

Xarbirus commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

This comment was marked as resolved.

CISC commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xarbirus commented Jun 2, 2026

Uh oh!

CISC commented Jun 2, 2026

Uh oh!

g0t4 commented Jun 2, 2026

Uh oh!

CISC commented Jun 2, 2026

Uh oh!

Uh oh!

coder543 commented Jun 3, 2026

Uh oh!

pilot7747 commented Jun 3, 2026

Uh oh!

chris-hatton commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrille37 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-hatton commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrille37 commented Jun 5, 2026

Uh oh!

chris-hatton commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrille37 commented Jun 5, 2026

Uh oh!

Ar4l commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Xarbirus commented Jun 1, 2026 •

edited

Loading

CISC commented Jun 1, 2026 •

edited

Loading

CISC commented Jun 2, 2026 •

edited

Loading

chris-hatton commented Jun 4, 2026 •

edited

Loading

Cyrille37 commented Jun 5, 2026 •

edited

Loading

chris-hatton commented Jun 5, 2026 •

edited

Loading

chris-hatton commented Jun 5, 2026 •

edited

Loading

Ar4l commented Jun 5, 2026 •

edited

Loading