-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable grad checkpointing after get_peft_model #2398
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks for the PR @SunMarc. For my better understanding, under what circumstances would this come up? So far, I thought we had subsumed this functionality under Lines 96 to 181 in 3dd2668
Maybe that's not a smart idea and having a separate method is preferable, I just need to understand. |
I've linked the issue here : huggingface/transformers#35826 |
Ah yes, I see, sorry for missing that. Indeed, it makes sense to enable the possibility like this since users may not use I checked the existing However, when I tried this, there was no error, even when moving |
Could you try with the following script ? : import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig, TaskType
import copy
def main():
train_data = {"input": "input test", "output": "output test"}
model_name = "codellama/CodeLlama-13b-Instruct-hf"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id
input_ids = tokenizer.encode(train_data["input"])
output_ids = tokenizer.encode(train_data["output"])
model_inputs_output = input_ids + output_ids + [tokenizer.eos_token_id]
model_inputs_output = torch.tensor(model_inputs_output, dtype=torch.int64)
labels = copy.deepcopy(model_inputs_output)
labels[: len(input_ids)] = -1 #
example_mask = model_inputs_output.ge(0)
label_mask = labels.ge(0)
model_inputs_output[~example_mask] = 0
labels[~label_mask] = -100
train_dataset = {
"input_ids": model_inputs_output.unsqueeze(0).to("cuda"),
"attention_mask": example_mask.unsqueeze(0).to("cuda"),
"labels": labels.unsqueeze(0).to("cuda")
}
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "gate_proj", "v_proj", "o_proj", "up_proj", "k_proj", "down_proj"], # 与llama-factory一致
lora_dropout=0.05,
task_type= TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)
model.train()
model.print_trainable_parameters()
model.to("cuda")
model.gradient_checkpointing_enable()
output = model(**train_dataset)
loss = output["loss"]
print(f"loss: {loss.requires_grad}")
if __name__ == "__main__":
main() |
Yes, I can confirm that Lines 1214 to 1246 in 3dd2668
(as mentioned, I moved the I checked if it could be the model, or if the loss needs to be calculated by the |
That's very strange, I don't know either :/ I'll continue to investigate tomorrow then ! |
What does this PR do ?
Fixes huggingface/transformers#35826
This PR enables grad checkpointing even if the model is already converted to a PeftModel. I can add a test if needed, but where should I put it ? I see that you have
_test_training_gradient_checkpointing
but it is a common test.The following works:
but not that (with this PR, it should work now) :