Skip to content

fix RLHF llama rewarding modeling backward issue#612

Merged
regisss merged 2 commits into
mainfrom
reward_llama
Jan 16, 2024
Merged

fix RLHF llama rewarding modeling backward issue#612
regisss merged 2 commits into
mainfrom
reward_llama

Conversation

@sywangyi
Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@sywangyi sywangyi requested a review from a user December 25, 2023 09:37
@sywangyi
Copy link
Copy Markdown
Collaborator Author

meet the issue during PPO rewarding model DDP finetune enabling.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [HPUBFloat16Type []] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py:251: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
File "/intel-extension-for-transformers/optimum-habana/examples/trl/stack_llama/reward_modeling.py", line 304, in
trainer.train(script_args.resume_from_checkpoint)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 491, in train
return inner_training_loop(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 852, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 1374, in training_step
loss = self.compute_loss(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/examples/trl/stack_llama/reward_modeling.py", line 271, in compute_loss
rewards_j = model(input_ids=inputs["input_ids_j"], attention_mask=inputs["attention_mask_j"])[0]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1530, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1519, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1530, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 816, in forward
return self.base_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1530, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 107, in forward
return self.model.forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1177, in forward
transformer_outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1571, in _call_impl
result = forward_call(*args, **kwargs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 571, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1571, in _call_impl
result = forward_call(*args, **kwargs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 371, in forward
output_pre_attn, self_attn_weights, present_key_value = self.pre_attn(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 413, in pre_attn
output_attn, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 250, in pre_attn_forward
attn_weights = self.matmul_qk(query_states, key_states.transpose(2, 3)) * self.norm_factor
(Triggered internally at /npu-stack/pytorch-fork/torch/csrc/autograd/python_anomaly_mode.cpp:114.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/intel-extension-for-transformers/optimum-habana/examples/trl/stack_llama/reward_modeling.py", line 304, in
trainer.train(script_args.resume_from_checkpoint)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 491, in train
return inner_training_loop(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 852, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 1382, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1989, in backward
loss.backward(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 502, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [HPUBFloat16Type []] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi sywangyi mentioned this pull request Dec 28, 2023
3 tasks
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Dec 29, 2023

Reading

one of the variables needed for gradient computation has been modified by an inplace operation

I would expect the fix to simply modify this operation so that changes are not done inplace. Or is that not possible?

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 3, 2024

It looks good to me but I'll wait for my Gaudi2 instance to be fixed before merging to check if training and inference throughputs are not impacted.

@sywangyi
Copy link
Copy Markdown
Collaborator Author

Any update by your side, have you got your gaudi2 card? @regisss

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 12, 2024

Any update by your side, have you got your gaudi2 card? @regisss

Yes, I'll check this PR today or tomorrow

@sywangyi
Copy link
Copy Markdown
Collaborator Author

thanks, glad to hear that.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 15, 2024

Hmm I see a 3% throughput regression on Llama2-70b generation with this fix.
Maybe you can re-push your first version of the fix so that I test it ans see if there is a regression too?

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Copy Markdown
Collaborator Author

Hmm I see a 3% throughput regression on Llama2-70b generation with this fix. Maybe you can re-push your first version of the fix so that I test it ans see if there is a regression too?

done. do you know the reason of the regression, does it break something like static shape?

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 15, 2024

Hmm I see a 3% throughput regression on Llama2-70b generation with this fix. Maybe you can re-push your first version of the fix so that I test it ans see if there is a regression too?

done. do you know the reason of the regression, does it break something like static shape?

Thanks, I'm going to try it.
It doesn't seem to break anything, just that adding .clone() clearly led to this. Then, I didn't profile it to get lower level information about what is going on under the hood.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a closer look at it and reading this thread, I think the best here is to define self.norm_factor as a non-tensor float:

  • a variable defined with register_buffer will be moved to the target device when calling model.to(device), which is not the case if it is defined as a regular tensor
  • we have persistent=False, which means that this variable will not be part of the state dict anyway (same as defining it as a float)

The current implementation with torch.tensor leads to a tiny speed regression because the tensor will always be on CPU, even after calling model.to(device). We could easily live with that, but it seems that just switching from a float tensor to a regular float gives a small speedup for the exact same behavior so let's do it.

@sywangyi Can you just check that your script still works with the change I'm suggesting?

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@sywangyi
Copy link
Copy Markdown
Collaborator Author

After taking a closer look at it and reading this thread, I think the best here is to define self.norm_factor as a non-tensor float:

  • a variable defined with register_buffer will be moved to the target device when calling model.to(device), which is not the case if it is defined as a regular tensor
  • we have persistent=False, which means that this variable will not be part of the state dict anyway (same as defining it as a float)

The current implementation with torch.tensor leads to a tiny speed regression because the tensor will always be on CPU, even after calling model.to(device). We could easily live with that, but it seems that just switching from a float tensor to a regular float gives a small speedup for the exact same behavior so let's do it.

@sywangyi Can you just check that your script still works with the change I'm suggesting?

works by myside.

@regisss regisss merged commit e3c02cf into main Jan 16, 2024
@regisss regisss deleted the reward_llama branch January 16, 2024 09:37
jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants