Skip to content

OLMo 2#1897

Merged
Borda merged 33 commits intoLightning-AI:mainfrom
ysjprojects:olmo2
Jun 4, 2025
Merged

OLMo 2#1897
Borda merged 33 commits intoLightning-AI:mainfrom
ysjprojects:olmo2

Conversation

@ysjprojects
Copy link
Collaborator

https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
https://arxiv.org/abs/2501.00656

Version 2 of OLMo released by Ai2.

Comes in 7B and 13B Base, Instruct, and additional SFT and DPO models.

First, we find that OLMo 2 7B and 13B are the best fully-open models to-date, often outperforming open weight models of equivalent size. Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo 0424 model but, notably, OLMo 2 7B outperforms LLama-3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance (see figure above).

@ysjprojects ysjprojects changed the title OLMo 2 OLMo 2 (WIP) Jan 4, 2025
@rasbt
Copy link
Contributor

rasbt commented Jan 8, 2025

Hi there,
just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

@ysjprojects
Copy link
Collaborator Author

Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

Thanks mate!

Currently on vacation, will resume working on this PR once I'm back.

@t-vi t-vi self-requested a review as a code owner January 30, 2025 08:17
@ysjprojects ysjprojects changed the title OLMo 2 (WIP) OLMo 2 Feb 26, 2025
@ysjprojects
Copy link
Collaborator Author

Performed some fixes today, now test_model passes for Olmo2.

@Borda Borda added the enhancement New feature or request label Mar 12, 2025
@Borda
Copy link
Collaborator

Borda commented Mar 20, 2025

seems like almost all tests are failing on:

FAILED tests/test_model.py::test_sdpa_choice_kv_cache[SmolLM2-1.7B-Instruct] - RuntimeError: Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility

@lantiga lantiga mentioned this pull request Apr 2, 2025
@Borda Borda enabled auto-merge (squash) April 3, 2025 01:54
Copy link
Contributor

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a few comments

config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_attention_norm else nn.Identity()
)
self.norm_2 = None if config.shared_attention_norm else config.norm_class(config.n_embd, eps=config.norm_eps)
self.norm_2 = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a less special-casey way of doing this could be to avoid the introduction of the boolean norm_1 and norm_2 configs, but rather just have Identity as the norm class itself

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.norm_1 = nn.Identity() if not config.norm_1 else config.norm_class(config.n_embd, eps=config.norm_eps)
        self.attn = CausalSelfAttention(config, block_idx)
        self.post_attention_norm = (
            config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_attention_norm else nn.Identity()
        )
        self.norm_2 = (
            nn.Identity()
            if not config.norm_2
            else (None if config.shared_attention_norm else config.norm_class(config.n_embd, eps=config.norm_eps))
        )
        self.mlp = config.mlp_class(config)
        self.post_mlp_norm = (
            config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_mlp_norm else nn.Identity()
        )

The issue is that olmo2 selectively use RMSNorm for post_attention_norm and post_mlp_norm but Identity for norm_1 and norm_2

Perhaps a way to get rid of the booleans would be to specify it as a special case for olmo2:

self.norm_1 = nn.Identity() if config.name.lower().startswith(("olmo-2-")) else...

IMO that's the easiest workaround to getting rid of norm_1 and nom_2 booleans.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about norm_1_class and norm_2_class as overrides to norm_class in the config file?
Then reading the config could set up norm_1_class, norm_2_class either from the config names or from norm_class.
This would move the cases from the model to the config. Ideally, we'd also subsume the shared_attention_norm, which seems to be very similar to not norm_2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea, it will be advantageous in the future, wdyt @ysjprojects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also do it in a follow up PR, doesn't need to be this one

auto-merge was automatically disabled April 23, 2025 08:06

Head branch was pushed to by a user without write access

@ysjprojects
Copy link
Collaborator Author

@lantiga I have addressed the issues and left some follow-up comments.

@Borda Borda enabled auto-merge (squash) May 22, 2025 12:14
auto-merge was automatically disabled June 3, 2025 16:22

Head branch was pushed to by a user without write access

@ysjprojects
Copy link
Collaborator Author

@Borda Changes should be ready to merge

@Borda Borda enabled auto-merge (squash) June 3, 2025 22:06
@Borda
Copy link
Collaborator

Borda commented Jun 3, 2025

@Borda Changes should be ready to merge

@t-vi mind have a look as codeowner, pls ^^

@Borda Borda merged commit d19df7a into Lightning-AI:main Jun 4, 2025
24 checks passed
mseeger pushed a commit to mseeger/litgpt that referenced this pull request Jul 4, 2025
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Jirka B <j.borovec+github@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <luca@lightning.ai>
Co-authored-by: shijie.yu <shijie@tensorplex.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants