Skip to content

Conversation

@thomasw21
Copy link
Member

@thomasw21 thomasw21 commented Aug 25, 2021

We suspect that the implementation for rotary is wrong, causing some of the following observations:

  • it's not beating absolute embeddings
  • It's unstable

Or responsible for the implementation: bigscience-workshop/Megatron-DeepSpeed#7
Tensorboard for 1B3 run: https://huggingface.co/bigscience/tr4-1B3-rotary-tensorboard
Default implementation for rotary: https://github.com/EleutherAI/gpt-neox/blob/main/megatron/model/positional_embeddings.py

We run 350M on oscar using similar config as tr3h.

@thomasw21 thomasw21 requested a review from TevenLeScao August 25, 2021 16:32
@thomasw21 thomasw21 changed the title Add config for rotary 350M debug Add config for rotary 350M oscar Sep 16, 2021
@thomasw21 thomasw21 merged commit 6b3b3a7 into bigscience-workshop:master Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant