Implement rotary embeddings #7

thomasw21 · 2021-07-21T18:32:15Z

Fixes: #6

Description

Mostly copy work already done from EleutherAI (I've left a link at the top of positional_embeddings.py).

It slightly differs from the paper implementation. In particular, it permutes the ordering of the dimension in order to have an easier to compute rotary matrix multiplication.

Also:

I've slightly modified the caching mechanism for positional embeddings compared to EuleutherAI as it seems to me there's a very easy way to have a better caching mechanism.

TODO:

Implement rotary
Be backward compatible with absolute position embeddings (we don't handle it for vit ...)
Test that the code works.
Check loss goes down.

Changes

To run the option, you have to use --position-embedding-type rotary
Previous absolute embeddings is used using the following arguments --position-embedding-type absolute --max-absolute-position-embeddings 512
We move enum.py from megatron/model/ to megatron/. The pratical reason is because it provoked circular dependency due to megatron/model/__init__.py. The "good" reason, is IMO this shouldn't live in model as the whole project should be able to access those enums. I can revert back and put my custom enum in arguments.py if you disagree with me.

megatron/arguments.py

TevenLeScao · 2021-07-23T17:55:43Z

megatron/model/fused_softmax.py


 import torch
-from megatron.model.enums import AttnMaskType
+from megatron.enums import AttnMaskType


This move surprises me. Did you move enums up a folder and if so, why?

Basically this caused circular dependency, when I added my enum in that file, and imported it in arguments.py. The reason why, is when you import from megatron.model.enums import AttnMaskType you'd execute megatron/model/__init__.py which means you're importing a bunch of code, some of which import arguments.py.
https://stackoverflow.com/questions/24302754/python-submodule-imports-using-init-py/24303380

In order to remove this dependency I've moved enums outside model, as it's safe to say that importing enums should not be linked to model. Please see the section Changes in the PR description.

… optimisation

… checkpointing and replace args.max_position_embeddings with an upper bound on the sequence sizes

- Rename max-absolute-embeddings back to max-absolute-embeddings - Make absolute position embeddings the default

TevenLeScao · 2021-07-27T15:39:51Z

megatron/model/positional_embeddings.py

+
+
+@torch.jit.script
+def apply_rotary_pos_emb(q, k, cos, sin, offset: int = 0):


I like having helper functions - but is there a reason why we can't run them from the forward?

So I'm not familiar with torch.jit.script but I guess it compiles it? I admit that the code was copy pasted from EleutherAI with some little modifications. I'm okay with removing this helper if you want.

stas00 · 2021-09-10T21:47:56Z

This broke the possibility to load our checkpoints into Megatron-LM or original Megatron-Deepspeed because they don't have megatron.enum and torch.load breaks.

deepspeedai/Megatron-DeepSpeed#14 (comment)

My initial inclination was to rename it back to how it was, but I don't think the rename will help at all. since it'll still lack the new enum PositionEmbeddingType - I hacked it with symlink to the other tree:

ln -s /hf/Megatron-DeepSpeed-master/megatron/enums.py /hf/Megatron-LM/megatron/

and now it finds both megatron.enums and megatron.model.enums.

for some reason PYTHONPATH with both clones in it wasn't helping. I think Megatron may be messing with sys.path so it gets ignored.

So probably the correct solution here is to figure out how to have both clones show up in sys.path.

thomasw21 changed the title ~~WIP: Implement rotary embeddings~~ Implement rotary embeddings Jul 22, 2021

thomasw21 requested a review from TevenLeScao July 22, 2021 23:49

thomasw21 marked this pull request as ready for review July 23, 2021 00:00

TevenLeScao requested changes Jul 23, 2021

View reviewed changes

thomasw21 requested a review from TevenLeScao July 23, 2021 22:19

Thomas added 12 commits July 24, 2021 00:23

Integrate EleutherAI's version of rotary embeddings + make some small…

e8d4d1c

… optimisation

Add argument parser for position embeddings

7844641

Making max-absolute-position-embeddings optional

836d044

Move enum outside model

c9523ea

Handle max_seq_len_cached better

215a38a

Fix dtype issue in rotary embeddings

0bd2138

Fix tensor size

a69c75d

Replace hidden_dim by hidden_size_per_attention_head

fbae8b9

Change all examples to new format and improve help in argparser

6bb1a33

Revert back changes, add comparison with position embedding type when…

1481556

… checkpointing and replace args.max_position_embeddings with an upper bound on the sequence sizes

Revert back changes:

0528e39

- Rename max-absolute-embeddings back to max-absolute-embeddings - Make absolute position embeddings the default

Reformat

605f585

thomasw21 force-pushed the rotary_embeddings branch from 193180e to 605f585 Compare July 23, 2021 22:28

Rm run.sh~ and modify back run.sh

99be67e

thomasw21 force-pushed the rotary_embeddings branch from 7b1b5ba to 99be67e Compare July 23, 2021 22:31

TevenLeScao reviewed Jul 27, 2021

View reviewed changes

TevenLeScao approved these changes Jul 27, 2021

View reviewed changes

thomasw21 merged commit dc4e0cb into bigscience-workshop:main Jul 28, 2021

This was referenced Aug 25, 2021

Add config for rotary 350M oscar bigscience-workshop/bigscience#5

Merged

Train baseline + rotary embeddings #42

Closed

stas00 mentioned this pull request Sep 10, 2021

Checkpoint conversion tools deepspeedai/Megatron-DeepSpeed#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement rotary embeddings #7

Implement rotary embeddings #7

Uh oh!

thomasw21 commented Jul 21, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TevenLeScao Jul 23, 2021

Uh oh!

thomasw21 Jul 23, 2021

Uh oh!

TevenLeScao Jul 27, 2021

Uh oh!

thomasw21 Jul 27, 2021

Uh oh!

stas00 commented Sep 10, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@torch.jit.script
		def apply_rotary_pos_emb(q, k, cos, sin, offset: int = 0):

Implement rotary embeddings #7

Implement rotary embeddings #7

Uh oh!

Conversation

thomasw21 commented Jul 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TevenLeScao Jul 23, 2021

Choose a reason for hiding this comment

Uh oh!

thomasw21 Jul 23, 2021

Choose a reason for hiding this comment

Uh oh!

TevenLeScao Jul 27, 2021

Choose a reason for hiding this comment

Uh oh!

thomasw21 Jul 27, 2021

Choose a reason for hiding this comment

Uh oh!

stas00 commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thomasw21 commented Jul 21, 2021 •

edited

Loading

stas00 commented Sep 10, 2021 •

edited

Loading