-
Notifications
You must be signed in to change notification settings - Fork 228
Implement rotary embeddings #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement rotary embeddings #7
Conversation
|
|
||
| import torch | ||
| from megatron.model.enums import AttnMaskType | ||
| from megatron.enums import AttnMaskType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This move surprises me. Did you move enums up a folder and if so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this caused circular dependency, when I added my enum in that file, and imported it in arguments.py. The reason why, is when you import from megatron.model.enums import AttnMaskType you'd execute megatron/model/__init__.py which means you're importing a bunch of code, some of which import arguments.py.
https://stackoverflow.com/questions/24302754/python-submodule-imports-using-init-py/24303380
In order to remove this dependency I've moved enums outside model, as it's safe to say that importing enums should not be linked to model. Please see the section Changes in the PR description.
… checkpointing and replace args.max_position_embeddings with an upper bound on the sequence sizes
- Rename max-absolute-embeddings back to max-absolute-embeddings - Make absolute position embeddings the default
193180e to
605f585
Compare
7b1b5ba to
99be67e
Compare
|
|
||
|
|
||
| @torch.jit.script | ||
| def apply_rotary_pos_emb(q, k, cos, sin, offset: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having helper functions - but is there a reason why we can't run them from the forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not familiar with torch.jit.script but I guess it compiles it? I admit that the code was copy pasted from EleutherAI with some little modifications. I'm okay with removing this helper if you want.
|
This broke the possibility to load our checkpoints into Megatron-LM or original Megatron-Deepspeed because they don't have deepspeedai/Megatron-DeepSpeed#14 (comment) My initial inclination was to rename it back to how it was, but I don't think the rename will help at all. since it'll still lack the new enum and now it finds both for some reason So probably the correct solution here is to figure out how to have both clones show up in |
Fixes: #6
Description
Mostly copy work already done from EleutherAI (I've left a link at the top of
positional_embeddings.py).It slightly differs from the paper implementation. In particular, it permutes the ordering of the dimension in order to have an easier to compute rotary matrix multiplication.
Also:
TODO:
Changes
--position-embedding-type rotary--position-embedding-type absolute --max-absolute-position-embeddings 512enum.pyfrommegatron/model/tomegatron/. The pratical reason is because it provoked circular dependency due tomegatron/model/__init__.py. The "good" reason, is IMO this shouldn't live in model as the whole project should be able to access those enums. I can revert back and put my custom enum inarguments.pyif you disagree with me.