Allow Unsloth Dynamic 4bit BnB quants to work by danielhanchen · Pull Request #12974 · vllm-project/vllm

danielhanchen · 2025-02-09T03:32:08Z

This PR allows vLLM to skip applying bitsandbytes quantization to certain layers, and leave them in 16bit. This will for now work only on skipped modules specified inside of llm_int8_skip_modules

For example https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit/blob/main/config.json has

"llm_int8_skip_modules": [
      "lm_head",
      "multi_modal_projector",
      "merger",
      "modality_projection",
      "model.layers.1.mlp"
],

will skip quantizing model.layers.1.mlp. Tagging @mgoin ! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow Unsloth Dynamic 4bit BnB quants to work#12974

Allow Unsloth Dynamic 4bit BnB quants to work#12974
simon-mo merged 4 commits intovllm-project:mainfrom
unslothai:main

danielhanchen commented Feb 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

danielhanchen commented Feb 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danielhanchen commented Feb 9, 2025 •

edited by github-actions bot

Loading