fix RLHF llama rewarding modeling backward issue by sywangyi · Pull Request #612 · huggingface/optimum-habana

sywangyi · 2023-12-25T09:37:08Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

sywangyi · 2023-12-25T09:39:08Z

meet the issue during PPO rewarding model DDP finetune enabling.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [HPUBFloat16Type []] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py:251: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
File "/intel-extension-for-transformers/optimum-habana/examples/trl/stack_llama/reward_modeling.py", line 304, in
trainer.train(script_args.resume_from_checkpoint)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 491, in train
return inner_training_loop(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 852, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 1374, in training_step
loss = self.compute_loss(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/examples/trl/stack_llama/reward_modeling.py", line 271, in compute_loss
rewards_j = model(input_ids=inputs["input_ids_j"], attention_mask=inputs["attention_mask_j"])[0]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1530, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1519, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1530, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 816, in forward
return self.base_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1530, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 107, in forward
return self.model.forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1177, in forward
transformer_outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1571, in _call_impl
result = forward_call(*args, **kwargs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 571, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1521, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1571, in _call_impl
result = forward_call(*args, **kwargs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 371, in forward
output_pre_attn, self_attn_weights, present_key_value = self.pre_attn(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 413, in pre_attn
output_attn, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 250, in pre_attn_forward
attn_weights = self.matmul_qk(query_states, key_states.transpose(2, 3)) * self.norm_factor
(Triggered internally at /npu-stack/pytorch-fork/torch/csrc/autograd/python_anomaly_mode.cpp:114.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/intel-extension-for-transformers/optimum-habana/examples/trl/stack_llama/reward_modeling.py", line 304, in
trainer.train(script_args.resume_from_checkpoint)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 491, in train
return inner_training_loop(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 852, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 1382, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1989, in backward
loss.backward(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 502, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [HPUBFloat16Type []] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

HuggingFaceDocBuilderDev · 2023-12-25T09:42:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss · 2023-12-29T15:22:50Z

Reading

one of the variables needed for gradient computation has been modified by an inplace operation

I would expect the fix to simply modify this operation so that changes are not done inplace. Or is that not possible?

regisss · 2024-01-03T21:04:56Z

It looks good to me but I'll wait for my Gaudi2 instance to be fixed before merging to check if training and inference throughputs are not impacted.

sywangyi · 2024-01-11T10:35:57Z

Any update by your side, have you got your gaudi2 card? @regisss

regisss · 2024-01-12T13:40:05Z

Any update by your side, have you got your gaudi2 card? @regisss

Yes, I'll check this PR today or tomorrow

sywangyi · 2024-01-13T01:20:52Z

thanks, glad to hear that.

regisss · 2024-01-15T09:38:33Z

Hmm I see a 3% throughput regression on Llama2-70b generation with this fix.
Maybe you can re-push your first version of the fix so that I test it ans see if there is a regression too?

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-01-15T10:28:20Z

Hmm I see a 3% throughput regression on Llama2-70b generation with this fix. Maybe you can re-push your first version of the fix so that I test it ans see if there is a regression too?

done. do you know the reason of the regression, does it break something like static shape?

regisss · 2024-01-15T13:09:27Z

Hmm I see a 3% throughput regression on Llama2-70b generation with this fix. Maybe you can re-push your first version of the fix so that I test it ans see if there is a regression too?

done. do you know the reason of the regression, does it break something like static shape?

Thanks, I'm going to try it.
It doesn't seem to break anything, just that adding .clone() clearly led to this. Then, I didn't profile it to get lower level information about what is going on under the hood.

regisss

After taking a closer look at it and reading this thread, I think the best here is to define self.norm_factor as a non-tensor float:

a variable defined with register_buffer will be moved to the target device when calling model.to(device), which is not the case if it is defined as a regular tensor
we have persistent=False, which means that this variable will not be part of the state dict anyway (same as defining it as a float)

The current implementation with torch.tensor leads to a tiny speed regression because the tensor will always be on CPU, even after calling model.to(device). We could easily live with that, but it seems that just switching from a float tensor to a regular float gives a small speedup for the exact same behavior so let's do it.

@sywangyi Can you just check that your script still works with the change I'm suggesting?

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

sywangyi · 2024-01-16T01:55:45Z

After taking a closer look at it and reading this thread, I think the best here is to define self.norm_factor as a non-tensor float:

a variable defined with register_buffer will be moved to the target device when calling model.to(device), which is not the case if it is defined as a regular tensor

we have persistent=False, which means that this variable will not be part of the state dict anyway (same as defining it as a float)

The current implementation with torch.tensor leads to a tiny speed regression because the tensor will always be on CPU, even after calling model.to(device). We could easily live with that, but it seems that just switching from a float tensor to a regular float gives a small speedup for the exact same behavior so let's do it.

@sywangyi Can you just check that your script still works with the change I'm suggesting?

works by myside.

sywangyi requested review from libinta and mandy-li as code owners December 25, 2023 09:37

sywangyi requested a review from a user December 25, 2023 09:37

sywangyi mentioned this pull request Dec 28, 2023

add PPO and stack_llama support #615

Merged

3 tasks

sywangyi force-pushed the reward_llama branch from d3a2f1c to 5208139 Compare January 1, 2024 12:45

fix RLHF llama rewarding modeling backward issue

2c50ae9

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi force-pushed the reward_llama branch from 5208139 to 2c50ae9 Compare January 15, 2024 10:26

regisss reviewed Jan 15, 2024

View reviewed changes

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated

Update optimum/habana/transformers/models/llama/modeling_llama.py

8ec4fdb

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

regisss approved these changes Jan 16, 2024

View reviewed changes

regisss merged commit e3c02cf into main Jan 16, 2024

regisss deleted the reward_llama branch January 16, 2024 09:37

jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024

Fix RLHF llama rewarding modeling backward issue (huggingface#612)

78fc45f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix RLHF llama rewarding modeling backward issue#612

fix RLHF llama rewarding modeling backward issue#612
regisss merged 2 commits into
mainfrom
reward_llama

sywangyi commented Dec 25, 2023

Uh oh!

sywangyi commented Dec 25, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Dec 25, 2023

Uh oh!

regisss commented Dec 29, 2023

Uh oh!

regisss commented Jan 3, 2024

Uh oh!

sywangyi commented Jan 11, 2024

Uh oh!

regisss commented Jan 12, 2024

Uh oh!

sywangyi commented Jan 13, 2024

Uh oh!

regisss commented Jan 15, 2024

Uh oh!

sywangyi commented Jan 15, 2024

Uh oh!

regisss commented Jan 15, 2024

Uh oh!

regisss left a comment •

edited

Loading

Uh oh!

Uh oh!

sywangyi commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sywangyi commented Dec 25, 2023

What does this PR do?

Before submitting

Uh oh!

sywangyi commented Dec 25, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Dec 25, 2023

Uh oh!

regisss commented Dec 29, 2023

Uh oh!

regisss commented Jan 3, 2024

Uh oh!

sywangyi commented Jan 11, 2024

Uh oh!

regisss commented Jan 12, 2024

Uh oh!

sywangyi commented Jan 13, 2024

Uh oh!

regisss commented Jan 15, 2024

Uh oh!

sywangyi commented Jan 15, 2024

Uh oh!

regisss commented Jan 15, 2024

Uh oh!

regisss left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sywangyi commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

regisss left a comment •

edited

Loading