Skip to content

Allow Unsloth Dynamic 4bit BnB quants to work#12974

Merged
simon-mo merged 4 commits intovllm-project:mainfrom
unslothai:main
Feb 13, 2025
Merged

Allow Unsloth Dynamic 4bit BnB quants to work#12974
simon-mo merged 4 commits intovllm-project:mainfrom
unslothai:main

Conversation

@danielhanchen
Copy link
Contributor

@danielhanchen danielhanchen commented Feb 9, 2025

This PR allows vLLM to skip applying bitsandbytes quantization to certain layers, and leave them in 16bit. This will for now work only on skipped modules specified inside of llm_int8_skip_modules

For example https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit/blob/main/config.json has

"llm_int8_skip_modules": [
      "lm_head",
      "multi_modal_projector",
      "merger",
      "modality_projection",
      "model.layers.1.mlp"
],

will skip quantizing model.layers.1.mlp. Tagging @mgoin ! :)

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants