fix(ppo_gpt): prevent position_ids being None #451

li-plus · 2023-04-24T16:12:32Z

This PR fixes the below error during PPO training, by generating the correct position_ids when it is None.

The PPO training was launched by:

accelerate launch --num_processes 7 --config_file ../../configs/accelerate/zero2-bf16.yaml ppo_hh.py

The below error was raised:

Traceback (most recent call last):
  File "/data00/home/lijiahao.plus/deepspeed/trlx/examples/hh/ppo_hh.py", line 227, in <module>
    main(hparams)
  File "/data00/home/lijiahao.plus/deepspeed/trlx/examples/hh/ppo_hh.py", line 216, in main
    trlx.train(
  File "/data00/home/lijiahao.plus/deepspeed/trlx/trlx/trlx.py", line 103, in train
    trainer.make_experience(config.method.num_rollouts)
  File "/data00/home/lijiahao.plus/deepspeed/trlx/trlx/trainer/accelerate_ppo_trainer.py", line 408, in make_experience
    ref_logits = self.model.forward_hydra(
  File "/data00/home/lijiahao.plus/deepspeed/trlx/trlx/models/modeling_ppo.py", line 387, in forward_hydra
    hydra_outputs = self.frozen_head(input_hidden_state, output_shape, **forward_kwargs)
  File "/data00/home/lijiahao.plus/miniconda3/envs/mlir/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data00/home/lijiahao.plus/deepspeed/trlx/trlx/models/modeling_ppo.py", line 515, in forward
    outputs = block(
  File "/data00/home/lijiahao.plus/miniconda3/envs/mlir/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data00/home/lijiahao.plus/miniconda3/envs/mlir/lib/python3.9/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 320, in forward
    attention_layer_outputs = self.attention(
  File "/data00/home/lijiahao.plus/miniconda3/envs/mlir/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data00/home/lijiahao.plus/miniconda3/envs/mlir/lib/python3.9/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 139, in forward
    query, key = apply_rotary_pos_emb(query_rot, key_rot, cos, sin, position_ids)
  File "/data00/home/lijiahao.plus/miniconda3/envs/mlir/lib/python3.9/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 278, in apply_rotary_pos_emb
    gather_indices = position_ids[:, None, :, None]  # [bs, 1, seq_len, 1]
TypeError: 'NoneType' object is not subscriptable

maxreciprocate · 2023-04-25T13:05:20Z

Hi @li-plus, thanks for fixing this! For the context, the newest transformers==4.28.1 fixed NeoX modeling's left-padding issue, however it made position_ids a required argument (it must not be None). I also resolved few failed tests which resulted from the fact that GPTModelBranch is common for both and gpt-neox and gpt2 type models and the latter's GPT2Block.forward doesn't have position_ids argument. Now tests pass with both transformers==4.27.1 and transformers==4.28.1

maxreciprocate · 2023-04-26T19:38:02Z

It fixes #457 on transformers==4.28.1 as well

jovany-wang · 2023-04-27T07:26:23Z

So should we enable transformers==4.28.1 ?

olliestanley · 2023-05-01T09:38:35Z

@jovany-wang It would be great if you could enable transformers==4.28.1 if this PR has fixed the underlying issue. >=4.28.1 is required to use LLaMA or any derived models for example, so the current pin can be quite inconvenient for some use cases.

maxreciprocate · 2023-05-01T10:11:45Z

@olliestanley This will be resolved shortly with #465

olliestanley · 2023-05-01T10:12:55Z

@reciprocated That's great, thank you for your fast response!

maxreciprocate · 2023-05-02T08:49:37Z

@olliestanley it is fixed now

li-plus and others added 5 commits April 24, 2023 23:59

fix(ppo_gpt): prevent position_ids being None

c7be211

fix(ppo_modeling): pop position_ids argument if not required

114d7f5

fix(ppo_modeling): add device argument for OPTModelBranch

a6d19b1

fix(modeling_ppo): de-complement if-condition

6130464

fix(ppo_modeling): condition passing device in OPTModelBranch

8540322

maxreciprocate approved these changes Apr 26, 2023

View reviewed changes

maxreciprocate merged commit 7331d63 into CarperAI:main Apr 26, 2023

olliestanley mentioned this pull request May 1, 2023

Can't install model_training LAION-AI/Open-Assistant#2878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ppo_gpt): prevent position_ids being None #451

fix(ppo_gpt): prevent position_ids being None #451

li-plus commented Apr 24, 2023

maxreciprocate commented Apr 25, 2023 •

edited

Loading

maxreciprocate commented Apr 26, 2023 •

edited

Loading

jovany-wang commented Apr 27, 2023

olliestanley commented May 1, 2023

maxreciprocate commented May 1, 2023

olliestanley commented May 1, 2023

maxreciprocate commented May 2, 2023

fix(ppo_gpt): prevent position_ids being None #451

fix(ppo_gpt): prevent position_ids being None #451

Conversation

li-plus commented Apr 24, 2023

maxreciprocate commented Apr 25, 2023 • edited Loading

maxreciprocate commented Apr 26, 2023 • edited Loading

jovany-wang commented Apr 27, 2023

olliestanley commented May 1, 2023

maxreciprocate commented May 1, 2023

olliestanley commented May 1, 2023

maxreciprocate commented May 2, 2023

maxreciprocate commented Apr 25, 2023 •

edited

Loading

maxreciprocate commented Apr 26, 2023 •

edited

Loading