[FEATURE] Compatible with 🤗 peft #103

PanQiWei · 2023-05-25T07:20:24Z

I open this issue for #102, a pr that aims to make auto_gptq can be used with 🤗 peft for both inference and fine tuning.

I will continually update development progress in this main post. And anyone interested in this new feature, has other suggestions relevant to this new feature, and have tried the branch of this pr but encountered problems, are welcomed to comment below in this issue. 🥂

currently, if training, must load quantized model with triton enabled; fused_attention and fused_mlp injection have to be disabled both inference and training

Dev progress

[2023-05-25] support inference and fine-tune with peft using ADAPTION_PROMPT peft type (the peft implementation of llama-adapter).
- If you install peft from source using this branch, then the model type you can use will not limit to llama only, and you can also use ADAPTION_PROMPT_V2 peft type that support multi-modal fine-tune, a relevant example project is [MMAdaptionPromptV2](https://github.com/PanQiWei/MMAdaptionPromptV2).
- Tutorial and example scripts of using auto_gptq with peft will be added as soon as possable!
[2023-05-25] support inference and fine-tune with peft using LORA peft type
[2023-05-28] support inference and fine-tune with peft using ADALORA pfet type
[2023-05-28] add instruction tuning example scripts under examples/peft that demonstrate how to use gptq quantized model to train Lora, AdaLora and AdaptionPrompt/AdaptionPromptV2 adapters.

The text was updated successfully, but these errors were encountered:

Ph0rk0z · 2023-05-25T11:38:44Z

Hopefully it doesn't break the https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/peft_tuners_lora_monkey_patch.py so both can be used together. Depending on which loading method is chosen.

PanQiWei · 2023-05-25T12:04:20Z

Hopefully it doesn't break the https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/peft_tuners_lora_monkey_patch.py so both can be used together. Depending on which loading method is chosen.

It won't, I build a stand-alone util module (auto_gptq.utils.peft_utils) to make auto_gptq compatible with peft, so the core module of auto_gptq will not be influenced, and thus no changes need to be applied in peft side.

Ph0rk0z · 2023-05-28T11:49:05Z

So to use this, I just have to pass trainable when loading the model?

qwopqwop200 · 2023-05-28T12:29:34Z

So to use this, I just have to pass trainable when loading the model?

yes

PanQiWei · 2023-05-28T13:47:57Z

So to use this, I just have to pass trainable when loading the model?

trainable is only required when using quantized model to train adapters.

mmealman · 2023-05-28T23:19:52Z

I did manage a train which was ridiculously fast, 2 hours for a 1 epoch on 7B with alpaca-clean, when I'm used to 5-9 hours with other projects. And the train played nicely with the model loaded into two 3090 cards. But I'm getting the below error when running inference on the model + LoRA. This is on the current commit of the peft_integration branch.

self and mat2 must have the same dtype

PanQiWei · 2023-05-29T03:48:00Z

I did manage a train which was ridiculously fast, 2 hours for a 1 epoch on 7B with alpaca-clean, when I'm used to 5-9 hours with other projects. And the train played nicely with the model loaded into two 3090 cards. But I'm getting the below error when running inference on the model + LoRA. This is on the current commit of the peft_integration branch.

self and mat2 must have the same dtype

For now add this line with torch.cuda.amp.auto_cast(): before inference should solve the problem, I will try to improve this internally later.

mmealman · 2023-05-29T05:43:52Z

I'm assuming that has to be done via model.generate() and pipeline() can't use it:

        inputs = self.tokenizer.encode(prompt_text, return_tensors='pt').to('cuda')
        with torch.cuda.amp.autocast():
            outputs = self.model.generate(
                input_ids=inputs, 
                ....

Still seeing the same error.

PanQiWei · 2023-05-29T12:49:53Z

Hi, did you use auto_gptq.utils.peft_util.get_gptq_peft_model help function provided in auto-gptq to wrap model at inference? Without using this function, you can't use Lora and AdaLora in peft.

PanQiWei added the enhancement New feature or request label May 25, 2023

PanQiWei mentioned this issue May 26, 2023

Peft integration #102

Merged

PanQiWei linked a pull request May 26, 2023 that will close this issue

Peft integration #102

Merged

PanQiWei closed this as completed in #102 Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Compatible with 🤗 peft #103

[FEATURE] Compatible with 🤗 peft #103

PanQiWei commented May 25, 2023 •

edited

Loading

Ph0rk0z commented May 25, 2023 •

edited

Loading

PanQiWei commented May 25, 2023

Ph0rk0z commented May 28, 2023

qwopqwop200 commented May 28, 2023

PanQiWei commented May 28, 2023

mmealman commented May 28, 2023

PanQiWei commented May 29, 2023

mmealman commented May 29, 2023

PanQiWei commented May 29, 2023

[FEATURE] Compatible with 🤗 peft #103

[FEATURE] Compatible with 🤗 peft #103

Comments

PanQiWei commented May 25, 2023 • edited Loading

Dev progress

Ph0rk0z commented May 25, 2023 • edited Loading

PanQiWei commented May 25, 2023

Ph0rk0z commented May 28, 2023

qwopqwop200 commented May 28, 2023

PanQiWei commented May 28, 2023

mmealman commented May 28, 2023

PanQiWei commented May 29, 2023

mmealman commented May 29, 2023

PanQiWei commented May 29, 2023

PanQiWei commented May 25, 2023 •

edited

Loading

Ph0rk0z commented May 25, 2023 •

edited

Loading