Conversation
|
Hi there, |
Thanks mate! Currently on vacation, will resume working on this PR once I'm back. |
|
Performed some fixes today, now test_model passes for Olmo2. |
|
seems like almost all tests are failing on: |
lantiga
left a comment
There was a problem hiding this comment.
Looks good! Left a few comments
| config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_attention_norm else nn.Identity() | ||
| ) | ||
| self.norm_2 = None if config.shared_attention_norm else config.norm_class(config.n_embd, eps=config.norm_eps) | ||
| self.norm_2 = ( |
There was a problem hiding this comment.
maybe a less special-casey way of doing this could be to avoid the introduction of the boolean norm_1 and norm_2 configs, but rather just have Identity as the norm class itself
There was a problem hiding this comment.
self.norm_1 = nn.Identity() if not config.norm_1 else config.norm_class(config.n_embd, eps=config.norm_eps)
self.attn = CausalSelfAttention(config, block_idx)
self.post_attention_norm = (
config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_attention_norm else nn.Identity()
)
self.norm_2 = (
nn.Identity()
if not config.norm_2
else (None if config.shared_attention_norm else config.norm_class(config.n_embd, eps=config.norm_eps))
)
self.mlp = config.mlp_class(config)
self.post_mlp_norm = (
config.norm_class(config.n_embd, eps=config.norm_eps) if config.post_mlp_norm else nn.Identity()
)
The issue is that olmo2 selectively use RMSNorm for post_attention_norm and post_mlp_norm but Identity for norm_1 and norm_2
Perhaps a way to get rid of the booleans would be to specify it as a special case for olmo2:
self.norm_1 = nn.Identity() if config.name.lower().startswith(("olmo-2-")) else...
IMO that's the easiest workaround to getting rid of norm_1 and nom_2 booleans.
There was a problem hiding this comment.
How about norm_1_class and norm_2_class as overrides to norm_class in the config file?
Then reading the config could set up norm_1_class, norm_2_class either from the config names or from norm_class.
This would move the cases from the model to the config. Ideally, we'd also subsume the shared_attention_norm, which seems to be very similar to not norm_2.
There was a problem hiding this comment.
that's a good idea, it will be advantageous in the future, wdyt @ysjprojects?
There was a problem hiding this comment.
we can also do it in a follow up PR, doesn't need to be this one
Head branch was pushed to by a user without write access
|
@lantiga I have addressed the issues and left some follow-up comments. |
for more information, see https://pre-commit.ci
Head branch was pushed to by a user without write access
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
@Borda Changes should be ready to merge |
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> Co-authored-by: Jirka B <j.borovec+github@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Luca Antiga <luca@lightning.ai> Co-authored-by: shijie.yu <shijie@tensorplex.ai>
https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
https://arxiv.org/abs/2501.00656
Version 2 of OLMo released by Ai2.
Comes in 7B and 13B Base, Instruct, and additional SFT and DPO models.