Skip to content

Conversation

@thomasw21
Copy link
Member

@thomasw21 thomasw21 commented Jul 21, 2021

Fixes: #6

Description

Mostly copy work already done from EleutherAI (I've left a link at the top of positional_embeddings.py).

It slightly differs from the paper implementation. In particular, it permutes the ordering of the dimension in order to have an easier to compute rotary matrix multiplication.

Also:

  • I've slightly modified the caching mechanism for positional embeddings compared to EuleutherAI as it seems to me there's a very easy way to have a better caching mechanism.

TODO:

  • Implement rotary
  • Be backward compatible with absolute position embeddings (we don't handle it for vit ...)
  • Test that the code works.
  • Check loss goes down.

Changes

  • To run the option, you have to use --position-embedding-type rotary
  • Previous absolute embeddings is used using the following arguments --position-embedding-type absolute --max-absolute-position-embeddings 512
  • We move enum.py from megatron/model/ to megatron/. The pratical reason is because it provoked circular dependency due to megatron/model/__init__.py. The "good" reason, is IMO this shouldn't live in model as the whole project should be able to access those enums. I can revert back and put my custom enum in arguments.py if you disagree with me.

@thomasw21 thomasw21 changed the title WIP: Implement rotary embeddings Implement rotary embeddings Jul 22, 2021
@thomasw21 thomasw21 requested a review from TevenLeScao July 22, 2021 23:49
@thomasw21 thomasw21 marked this pull request as ready for review July 23, 2021 00:00

import torch
from megatron.model.enums import AttnMaskType
from megatron.enums import AttnMaskType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This move surprises me. Did you move enums up a folder and if so, why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this caused circular dependency, when I added my enum in that file, and imported it in arguments.py. The reason why, is when you import from megatron.model.enums import AttnMaskType you'd execute megatron/model/__init__.py which means you're importing a bunch of code, some of which import arguments.py.
https://stackoverflow.com/questions/24302754/python-submodule-imports-using-init-py/24303380

In order to remove this dependency I've moved enums outside model, as it's safe to say that importing enums should not be linked to model. Please see the section Changes in the PR description.

@thomasw21 thomasw21 requested a review from TevenLeScao July 23, 2021 22:19
@thomasw21 thomasw21 force-pushed the rotary_embeddings branch from 193180e to 605f585 Compare July 23, 2021 22:28
@thomasw21 thomasw21 force-pushed the rotary_embeddings branch from 7b1b5ba to 99be67e Compare July 23, 2021 22:31


@torch.jit.script
def apply_rotary_pos_emb(q, k, cos, sin, offset: int = 0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having helper functions - but is there a reason why we can't run them from the forward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not familiar with torch.jit.script but I guess it compiles it? I admit that the code was copy pasted from EleutherAI with some little modifications. I'm okay with removing this helper if you want.

@thomasw21 thomasw21 merged commit dc4e0cb into bigscience-workshop:main Jul 28, 2021
@stas00
Copy link
Contributor

stas00 commented Sep 10, 2021

This broke the possibility to load our checkpoints into Megatron-LM or original Megatron-Deepspeed because they don't have megatron.enum and torch.load breaks.

deepspeedai/Megatron-DeepSpeed#14 (comment)

My initial inclination was to rename it back to how it was, but I don't think the rename will help at all. since it'll still lack the new enum PositionEmbeddingType - I hacked it with symlink to the other tree:

ln -s /hf/Megatron-DeepSpeed-master/megatron/enums.py /hf/Megatron-LM/megatron/

and now it finds both megatron.enums and megatron.model.enums.

for some reason PYTHONPATH with both clones in it wasn't helping. I think Megatron may be messing with sys.path so it gets ignored.

So probably the correct solution here is to figure out how to have both clones show up in sys.path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rotary embeddings

3 participants