Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Compatible with 🤗 peft #103

Closed
PanQiWei opened this issue May 25, 2023 · 9 comments · Fixed by #102
Closed

[FEATURE] Compatible with 🤗 peft #103

PanQiWei opened this issue May 25, 2023 · 9 comments · Fixed by #102
Labels
enhancement New feature or request

Comments

@PanQiWei
Copy link
Collaborator

PanQiWei commented May 25, 2023

I open this issue for #102, a pr that aims to make auto_gptq can be used with 🤗 peft for both inference and fine tuning.

I will continually update development progress in this main post. And anyone interested in this new feature, has other suggestions relevant to this new feature, and have tried the branch of this pr but encountered problems, are welcomed to comment below in this issue. 🥂


currently, if training, must load quantized model with triton enabled; fused_attention and fused_mlp injection have to be disabled both inference and training

Dev progress

  • [2023-05-25] support inference and fine-tune with peft using ADAPTION_PROMPT peft type (the peft implementation of llama-adapter).
  • [2023-05-25] support inference and fine-tune with peft using LORA peft type
  • [2023-05-28] support inference and fine-tune with peft using ADALORA pfet type
  • [2023-05-28] add instruction tuning example scripts under examples/peft that demonstrate how to use gptq quantized model to train Lora, AdaLora and AdaptionPrompt/AdaptionPromptV2 adapters.
@PanQiWei PanQiWei added the enhancement New feature or request label May 25, 2023
@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented May 25, 2023

Hopefully it doesn't break the https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/peft_tuners_lora_monkey_patch.py so both can be used together. Depending on which loading method is chosen.

@PanQiWei
Copy link
Collaborator Author

Hopefully it doesn't break the https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/peft_tuners_lora_monkey_patch.py so both can be used together. Depending on which loading method is chosen.

It won't, I build a stand-alone util module (auto_gptq.utils.peft_utils) to make auto_gptq compatible with peft, so the core module of auto_gptq will not be influenced, and thus no changes need to be applied in peft side.

@PanQiWei PanQiWei linked a pull request May 26, 2023 that will close this issue
@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented May 28, 2023

So to use this, I just have to pass trainable when loading the model?

@qwopqwop200
Copy link
Collaborator

So to use this, I just have to pass trainable when loading the model?

yes

@PanQiWei
Copy link
Collaborator Author

So to use this, I just have to pass trainable when loading the model?

trainable is only required when using quantized model to train adapters.

@mmealman
Copy link

I did manage a train which was ridiculously fast, 2 hours for a 1 epoch on 7B with alpaca-clean, when I'm used to 5-9 hours with other projects. And the train played nicely with the model loaded into two 3090 cards. But I'm getting the below error when running inference on the model + LoRA. This is on the current commit of the peft_integration branch.

self and mat2 must have the same dtype

@PanQiWei
Copy link
Collaborator Author

I did manage a train which was ridiculously fast, 2 hours for a 1 epoch on 7B with alpaca-clean, when I'm used to 5-9 hours with other projects. And the train played nicely with the model loaded into two 3090 cards. But I'm getting the below error when running inference on the model + LoRA. This is on the current commit of the peft_integration branch.

self and mat2 must have the same dtype

For now add this line with torch.cuda.amp.auto_cast(): before inference should solve the problem, I will try to improve this internally later.

@mmealman
Copy link

I'm assuming that has to be done via model.generate() and pipeline() can't use it:

        inputs = self.tokenizer.encode(prompt_text, return_tensors='pt').to('cuda')
        with torch.cuda.amp.autocast():
            outputs = self.model.generate(
                input_ids=inputs, 
                ....

Still seeing the same error.

@PanQiWei
Copy link
Collaborator Author

Hi, did you use auto_gptq.utils.peft_util.get_gptq_peft_model help function provided in auto-gptq to wrap model at inference? Without using this function, you can't use Lora and AdaLora in peft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants