support for quantization and qlora models? #906

thistleknot · 2023-08-29T13:55:23Z

would love to use this with quantization and adapters

Miraclemarvel55 · 2023-08-30T08:51:50Z

yes quite need. that is amazing

Gregory-Ledray · 2023-08-30T16:40:17Z

I am not a maintainer.

This is a duplicate of #744

LopezGG · 2023-09-04T10:09:48Z

I am using vllm with qlora. I merge the adapters with

model_for_merge = AutoModelForCausalLM.from_pretrained(
        base_model_path_or_name,
        torch_dtype=torch.float16,
    )
    full_model = PeftModel.from_pretrained(model_for_merge,
                                          model_id=script_args.adapter_dir,
                                        )
    full_model = full_model.base_model.merge_and_unload()  
    full_model.save_pretrained(script_args.full_model_dir)
    tokenizer.save_pretrained(script_args.full_model_dir)

and point vllm to this dir. Given that the A & B matrices are added to original weight matrices, there is no change in dimension or architecture. So, vllm can consume it. I would like a way to pass the model directly after load without saving it though

hmellor · 2024-03-08T11:24:33Z

Closing in favour of #3225 because quantization and LoRA are both supported. We just need QLoRA support.

hmellor closed this as completed Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for quantization and qlora models? #906

support for quantization and qlora models? #906

thistleknot commented Aug 29, 2023

Miraclemarvel55 commented Aug 30, 2023

Gregory-Ledray commented Aug 30, 2023

LopezGG commented Sep 4, 2023

hmellor commented Mar 8, 2024

support for quantization and qlora models? #906

support for quantization and qlora models? #906

Comments

thistleknot commented Aug 29, 2023

Miraclemarvel55 commented Aug 30, 2023

Gregory-Ledray commented Aug 30, 2023

LopezGG commented Sep 4, 2023

hmellor commented Mar 8, 2024