-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Compatible with 🤗 peft #103
Comments
Hopefully it doesn't break the https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/peft_tuners_lora_monkey_patch.py so both can be used together. Depending on which loading method is chosen. |
It won't, I build a stand-alone util module ( |
So to use this, I just have to pass trainable when loading the model? |
yes |
trainable is only required when using quantized model to train adapters. |
I did manage a train which was ridiculously fast, 2 hours for a 1 epoch on 7B with alpaca-clean, when I'm used to 5-9 hours with other projects. And the train played nicely with the model loaded into two 3090 cards. But I'm getting the below error when running inference on the model + LoRA. This is on the current commit of the peft_integration branch.
|
For now add this line |
I'm assuming that has to be done via model.generate() and pipeline() can't use it: inputs = self.tokenizer.encode(prompt_text, return_tensors='pt').to('cuda')
with torch.cuda.amp.autocast():
outputs = self.model.generate(
input_ids=inputs,
.... Still seeing the same error. |
Hi, did you use |
I open this issue for #102, a pr that aims to make
auto_gptq
can be used with🤗 peft
for both inference and fine tuning.I will continually update development progress in this main post. And anyone interested in this new feature, has other suggestions relevant to this new feature, and have tried the branch of this pr but encountered problems, are welcomed to comment below in this issue. 🥂
currently, if training, must load quantized model with triton enabled; fused_attention and fused_mlp injection have to be disabled both inference and training
Dev progress
ADAPTION_PROMPT
peft type (the peft implementation of llama-adapter).llama
only, and you can also useADAPTION_PROMPT_V2
peft type that support multi-modal fine-tune, a relevant example project is [MMAdaptionPromptV2](https://github.com/PanQiWei/MMAdaptionPromptV2).auto_gptq
withpeft
will be added as soon as possable!LORA
peft typeADALORA
pfet typeexamples/peft
that demonstrate how to use gptq quantized model to train Lora, AdaLora and AdaptionPrompt/AdaptionPromptV2 adapters.The text was updated successfully, but these errors were encountered: