-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to explicitly set use_reentrant when calling checkpoint #26969
Comments
cc @fxmarty would you like to have a look at this? 😉 |
Seems like @younesbelkada also needs this in #26917 |
You can set it explicitly in the training_args arguments by using the gradient_checkpointing_kwargs argument training_args = TrainingArguments(
# Arguments
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant':False} # OR gradient_checkpointing_kwargs={'use_reentrant':True}
# Arguments
) |
FYI, this solution does not work when using SFTTrainer() from trl as the parameter is not exposed. |
@GrahamEckel can you elaborate on the issue you face with TRL SFTTrainer? Ideally with a small reproducer 🙏 |
Are we able to fix this when NOT using the trainer? I tried passing I'm currently on Transformers 4.35.2. |
@LuciferianInk which model are you using? model.gradient_checkpointing_enable(gradient_checkpointing_kwargs={"use_reentrant": False}) Should work for all standard transformers model. We also have CI tests for that: https://github.com/huggingface/transformers/blob/main/tests/test_modeling_common.py#L575 and transformers/tests/test_modeling_common.py Line 626 in ac97507
|
Oops, syntax error. Sorry for the false alarm. With your example, I was able to fix that! |
Awesome, thanks ! |
I am trying to finetune mistral 7b using SFT and PEFT, but i get the following error when I have I have tried These are the versions I have: Here is my code:
|
Hi @manmax31 |
Thank you. Is this fix not in pypi yet? |
cc @ArthurZucker @amyeroberts would it makes sense to do a patch release to include #28031 ? it fixes a regression issue - i.e. users were able to train as usual with PEFT and GC before introducing the attention refactor and #28031 fixes it |
That will be great. I am currently now back to 4.35.2 |
@younesbelkada If it's a regression, then yes, I think we should do a patch release (also including #28043 and #28061) cc @ArthurZucker WDYT? |
Yes 👍🏻 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Was fixed and released so closing |
* fix bug: generate_args-do_sample * fix gradient_checkpointing_kwargs bug see: huggingface/trl#912 and huggingface/transformers#26969 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@ArthurZucker is this issue fixed, still facing same issue even after fresh release installed from source |
Could you open a new issue, with a fresh reproducer, the output of |
System Info
windows
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
according to new pytorch, you need to now explicitly set use_reentrant as it will be changed from use_reentrant=True to use_reentrant=False in near future
transformers.models.llama.modeling_llama
def forward...
Expected behavior
need to explicitly set use_reentrant
The text was updated successfully, but these errors were encountered: