Add Olmo3 implementation by 2015aroras · Pull Request #16015 · ggml-org/llama.cpp

2015aroras · 2025-09-15T17:52:37Z

This PR adds the upcoming Olmo 3. The main architectural differences from Olmo 2 are:

Sliding window attention is used for 3 out of 4 layers. RoPE scaling is not applied to sliding window attention layers.

Since the architecture is very similar to Olmo 2, this PR opts to merge Olmo 3 changes into the Olmo 2 implementation (similar to vllm-project/vllm#24534). I can create a separate Olmo 3 implementation instead if preferred.

2015aroras · 2025-09-15T20:12:49Z

I used the model conversion example for testing. I got the following results when using bf16 on shanearora/2025-sep-a-base-model, modified to have yarn rope scaling enabled.

📈 METRICS
==============================
MSE (Mean Squared Error):     1.592396e-02
Reference Variance:           6.831117e+00
NMSE:                         2.331092e-03
Max Absolute Error:           0.438750
Mean Absolute Error:          0.116665
NMSE (dB):                    -26.32 dB

🎯 INTERPRETATION
==============================
👍 Good match

📋 GUIDANCE
==============================
👍 GOOD: Your GGML conversion is working well.
   Small differences are likely due to precision/quantization.

📚 NMSE BENCHMARKS
==============================
✅ RESULT: PASS (NMSE = 2.33e-03)

Also, below is the allenai/OLMo-2-0425-1B with fp32.

📈 METRICS
==============================
MSE (Mean Squared Error):     1.594746e-03
Reference Variance:           9.219801e+00
NMSE:                         1.729697e-04
Max Absolute Error:           0.168732
Mean Absolute Error:          0.033951
NMSE (dB):                    -37.62 dB

🎯 INTERPRETATION
==============================
👍 Very good match

📋 GUIDANCE
==============================
✅ EXCELLENT: Your GGML conversion is working very well!
   The differences are negligible for practical use.

📚 NMSE BENCHMARKS
==============================
✅ RESULT: PASS (NMSE = 1.73e-04)

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

pwilkin · 2025-09-16T10:01:16Z

@2015aroras What tool are you using to compare the conversion?

2015aroras · 2025-09-16T16:50:53Z

@pwilkin I am using the model conversion tools inside this repo. These have been created to help make sure HF to llama.cpp conversion is accurate. The logs above are from the Model logits verfication step.

pwilkin · 2025-09-16T19:02:13Z

Ah, that's nice, I haven't used that specific one yet :)

2015aroras · 2025-09-16T19:07:52Z

All the check failures seem to be unrelated to this change. Before merging master again, an ios check was failing instead. So imo ready to merge.

CISC · 2025-09-16T20:32:21Z

All the check failures seem to be unrelated to this change. Before merging master again, an ios check was failing instead. So imo ready to merge.

Yes, sorry for the delay, just a minor cosmetic change and we'll merge. :)

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

@CISC

* Add HF to gguf conversion logic for Olmo3 * Add Olmo3 implementation * Update rope comment * Fix indentation Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Apply suggestion from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

2015aroras added 2 commits September 15, 2025 10:30

Add HF to gguf conversion logic for Olmo3

bd1f3e5

Add Olmo3 implementation

a5f19bb

github-actions bot added the python python script changes label Sep 15, 2025

2015aroras marked this pull request as ready for review September 15, 2025 20:08

CISC approved these changes Sep 15, 2025

View reviewed changes

Comment thread src/llama-model.cpp Outdated

Comment thread src/llama-model.cpp Outdated

2015aroras and others added 2 commits September 15, 2025 15:13

Update rope comment

6997fad

Fix indentation

aea35e8

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Merge branch 'master' into shanea/olmo3

8617ad7

CISC reviewed Sep 16, 2025

View reviewed changes

Comment thread src/llama-model.cpp Outdated

Apply suggestion from @CISC

7cd97a2

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC merged commit 85286f3 into ggml-org:master Sep 17, 2025
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Olmo3 implementation#16015

Add Olmo3 implementation#16015
CISC merged 6 commits intoggml-org:masterfrom
2015aroras:shanea/olmo3

2015aroras commented Sep 15, 2025 •

edited

Loading

Uh oh!

2015aroras commented Sep 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Sep 16, 2025

Uh oh!

2015aroras commented Sep 16, 2025

Uh oh!

pwilkin commented Sep 16, 2025

Uh oh!

2015aroras commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

CISC commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

2015aroras commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

2015aroras commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Sep 16, 2025

Uh oh!

2015aroras commented Sep 16, 2025

Uh oh!

pwilkin commented Sep 16, 2025

Uh oh!

2015aroras commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2015aroras commented Sep 15, 2025 •

edited

Loading

2015aroras commented Sep 15, 2025 •

edited

Loading

2015aroras commented Sep 16, 2025 •

edited

Loading