OLMo 2 by ysjprojects · Pull Request #1897 · Lightning-AI/litgpt

ysjprojects · 2025-01-04T05:18:40Z

https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
https://arxiv.org/abs/2501.00656

Version 2 of OLMo released by Ai2.

Comes in 7B and 13B Base, Instruct, and additional SFT and DPO models.

First, we find that OLMo 2 7B and 13B are the best fully-open models to-date, often outperforming open weight models of equivalent size. Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo 0424 model but, notably, OLMo 2 7B outperforms LLama-3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance (see figure above).

rasbt · 2025-01-08T15:07:50Z

Hi there,
just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

ysjprojects · 2025-01-12T07:13:29Z

Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

Thanks mate!

Currently on vacation, will resume working on this PR once I'm back.

ysjprojects · 2025-02-26T22:38:27Z

Performed some fixes today, now test_model passes for Olmo2.

Borda · 2025-03-20T09:09:51Z

seems like almost all tests are failing on:

FAILED tests/test_model.py::test_sdpa_choice_kv_cache[SmolLM2-1.7B-Instruct] - RuntimeError: Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility

for more information, see https://pre-commit.ci

lantiga

Looks good! Left a few comments

litgpt/model.py

lantiga · 2025-04-03T17:10:06Z

litgpt/model.py

            config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_attention_norm else nn.Identity()
        )
-        self.norm_2 = None if config.shared_attention_norm else config.norm_class(config.n_embd, eps=config.norm_eps)
+        self.norm_2 = (


maybe a less special-casey way of doing this could be to avoid the introduction of the boolean norm_1 and norm_2 configs, but rather just have Identity as the norm class itself

self.norm_1 = nn.Identity() if not config.norm_1 else config.norm_class(config.n_embd, eps=config.norm_eps) self.attn = CausalSelfAttention(config, block_idx) self.post_attention_norm = ( config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_attention_norm else nn.Identity() ) self.norm_2 = ( nn.Identity() if not config.norm_2 else (None if config.shared_attention_norm else config.norm_class(config.n_embd, eps=config.norm_eps)) ) self.mlp = config.mlp_class(config) self.post_mlp_norm = ( config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_mlp_norm else nn.Identity() )

The issue is that olmo2 selectively use RMSNorm for post_attention_norm and post_mlp_norm but Identity for norm_1 and norm_2

Perhaps a way to get rid of the booleans would be to specify it as a special case for olmo2:

self.norm_1 = nn.Identity() if config.name.lower().startswith(("olmo-2-")) else...

IMO that's the easiest workaround to getting rid of norm_1 and nom_2 booleans.

How about norm_1_class and norm_2_class as overrides to norm_class in the config file?
Then reading the config could set up norm_1_class, norm_2_class either from the config names or from norm_class.
This would move the cases from the model to the config. Ideally, we'd also subsume the shared_attention_norm, which seems to be very similar to not norm_2.

that's a good idea, it will be advantageous in the future, wdyt @ysjprojects?

we can also do it in a follow up PR, doesn't need to be this one

.gitignore

litgpt/config.py

ysjprojects · 2025-05-16T06:57:54Z

@lantiga I have addressed the issues and left some follow-up comments.

for more information, see https://pre-commit.ci

ysjprojects · 2025-06-03T21:17:49Z

@Borda Changes should be ready to merge

Borda · 2025-06-03T22:07:00Z

@Borda Changes should be ready to merge

@t-vi mind have a look as codeowner, pls ^^

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> Co-authored-by: Jirka B <j.borovec+github@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Luca Antiga <luca@lightning.ai> Co-authored-by: shijie.yu <shijie@tensorplex.ai>

OLMo 2: implemented core

b62eed2

ysjprojects requested review from lantiga and rasbt as code owners January 4, 2025 05:18

ysjprojects added 3 commits January 4, 2025 00:37

minor fix

f559763

fix vocab size

276a8fc

fix test_model

1ac888f

ysjprojects changed the title ~~OLMo 2~~ OLMo 2 (WIP) Jan 4, 2025

ysjprojects added 6 commits January 7, 2025 19:47

custom conversion fn for olmo2 due to new q_norm and k_norm components

d3456e3

minor fix

121f851

minor fix on test_model.py

3d34921

fix: post_feedforward_layernorm

15f549d

minor fix

ac3509f

input_norm

852ca3e

t-vi self-requested a review as a code owner January 30, 2025 08:17

ysjprojects added 2 commits February 26, 2025 14:42

Merge branch 'main' into olmo2

ff47a66

fixed olmo2

69adbd9

ysjprojects changed the title ~~OLMo 2 (WIP)~~ OLMo 2 Feb 26, 2025

Merge branch 'main' into olmo2

5ab1796

Borda added the enhancement New feature or request label Mar 12, 2025

Merge branch 'main' into olmo2

8da6edb

Borda approved these changes Mar 20, 2025

View reviewed changes

Borda and others added 3 commits March 20, 2025 10:13

CUBLAS_WORKSPACE_CONFIG

ad5724f

Merge branch 'main' into olmo2

9e40a07

[pre-commit.ci] auto fixes from pre-commit.com hooks

29d2a76

for more information, see https://pre-commit.ci

lantiga mentioned this pull request Apr 2, 2025

Phi4 mini #1949

Merged

Merge branch 'main' into olmo2

b73fac5

Borda added 4 commits April 2, 2025 17:23

Merge branch 'main' into olmo2

5fa318a

Merge branch 'main' into olmo2

e1caecf

Merge branch 'main' into olmo2

e8b43c8

Merge branch 'main' into olmo2

8d8e327

Borda enabled auto-merge (squash) April 3, 2025 01:54

Merge branch 'main' into olmo2

487187f

lantiga reviewed Apr 3, 2025

View reviewed changes

Borda added the waiting on author label Apr 7, 2025

ysjprojects added 3 commits April 23, 2025 03:41

removed .gitignore redundant part

786650c

localize norm_q and norm_k to invocation strictly when norm_qk is True

c7bcfb2

revert prev and removed redundant

b1bbe36

auto-merge was automatically disabled April 23, 2025 08:06
Head branch was pushed to by a user without write access

Borda removed the waiting on author label May 16, 2025

lantiga approved these changes May 16, 2025

View reviewed changes

Merge branch 'main' into olmo2

bc0a34e

Borda enabled auto-merge (squash) May 22, 2025 12:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

d4179d1

for more information, see https://pre-commit.ci

Borda added the waiting on author label May 27, 2025

Merge branch 'main' into olmo2

e7a8052

auto-merge was automatically disabled June 3, 2025 16:22
Head branch was pushed to by a user without write access

pre-commit-ci bot and others added 3 commits June 3, 2025 16:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

0b00942

for more information, see https://pre-commit.ci

fixes to olmo2 q and k norm modules

0bbd667

[pre-commit.ci] auto fixes from pre-commit.com hooks

f627087

for more information, see https://pre-commit.ci

Borda enabled auto-merge (squash) June 3, 2025 22:06

Borda removed the waiting on author label Jun 4, 2025

Borda merged commit d19df7a into Lightning-AI:main Jun 4, 2025
24 checks passed

Conversation

ysjprojects commented Jan 4, 2025

Uh oh!

rasbt commented Jan 8, 2025

Uh oh!

ysjprojects commented Jan 12, 2025

Uh oh!

ysjprojects commented Feb 26, 2025

Uh oh!

Borda commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lantiga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lantiga Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ysjprojects Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

t-vi May 16, 2025

Choose a reason for hiding this comment

Uh oh!

lantiga May 16, 2025

Choose a reason for hiding this comment

Uh oh!

lantiga May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ysjprojects commented May 16, 2025

Uh oh!

ysjprojects commented Jun 3, 2025

Uh oh!

Borda commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Borda commented Mar 20, 2025 •

edited

Loading