[Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled for GPT models #6153

barry-jin · 2023-03-08T21:08:18Z

Describe the bug

When enabling rope position_embedding_type and sequence_parallel in NeMo GPT pretraining, the training will give runtime error:

  File "/workspace/NeMo/nemo/collections/nlp/modules/common/megatron/rotary_pos_embedding.py", line 59, in apply_rotary_pos_emb
    t = (t * freqs.cos()) + (_rotate_half(t) * freqs.sin())
RuntimeError: The size of tensor a (2048) must match the size of tensor b (1024) at non-singleton dimension 0

Steps/Code to reproduce bug

Follow the tutorial to train GPT model with model.position_embedding_type=rope and model.sequence_parallel=True

Expected behavior

Expect there will be no runtime error when both options are enabled.

Environment overview (please complete the following information)

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

OS version
PyTorch version
Python version

Additional context

Add any other context about the problem here.
Example: GPU model

The text was updated successfully, but these errors were encountered:

okuchaiev · 2023-03-14T00:35:25Z

Can you post your model config?

MaximumEntropy · 2023-03-14T00:59:07Z

Thanks! This was just fixed here - #6178.

barry-jin added the bug Something isn't working label Mar 8, 2023

barry-jin changed the title ~~[Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled~~ [Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled for GPT models Mar 8, 2023

MaximumEntropy closed this as completed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled for GPT models #6153

[Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled for GPT models #6153

barry-jin commented Mar 8, 2023

okuchaiev commented Mar 14, 2023

MaximumEntropy commented Mar 14, 2023

[Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled for GPT models #6153

[Bug] RoPE positional embedding doesn't work when sequence_parallel is enabled for GPT models #6153

Comments

barry-jin commented Mar 8, 2023

okuchaiev commented Mar 14, 2023

MaximumEntropy commented Mar 14, 2023