[3/N] rs/vllm quantization - Adding models to non-uniform refactor by robertgshaw2-redhat · Pull Request #189 · neuralmagic/nm-vllm

robertgshaw2-redhat · 2024-04-14T02:31:12Z

Since we changes the LinearMethod interface to require layer_name, we need to update each model.py to plumb this information through the models. We need to do this, because we need to pass the layer_name to LinearMethodBase.create_weights, such that we have have non-uniform quantization / compression (as we need to be able to consult the quantization config to determine what the weights / format should look like and we use the layer name to decide this

So far, have updated:

llama
gemma
phi-2
gpt2
starcoder2
qwen2
deepseek and deepseekMoE
baichuan

To test:

python3 examples/simple_test.py --help

To Update:

Pass layer_name to QKVParallelLinear, MergedColumnParallelLinear, ColumnParallelLinear, RowParallelLinear by plumbing parent_name through from Model --> DecoderLayer --> MLP / SelfAttention --> Layer
Updated weight_loader with linear_method.maybe_update_name

robertgshaw2-redhat marked this pull request as ready for review April 14, 2024 02:38

robertgshaw2-redhat changed the title ~~Slowly Updating Models For Refactor~~ [3/N] rs/vllm-quantization - Adding Models For Refactor Apr 14, 2024

robertgshaw2-redhat changed the title ~~[3/N] rs/vllm-quantization - Adding Models For Refactor~~ [3/N] rs/vllm quantization - Adding Models For Refactor Apr 14, 2024

robertgshaw2-redhat changed the title ~~[3/N] rs/vllm quantization - Adding Models For Refactor~~ [3/N] rs/vllm quantization - Adding models to non-uniform refactor Apr 14, 2024

Base automatically changed from rs/vllm-quantization-config-refactor to rs/vllm-quantization April 16, 2024 17:32

Base automatically changed from rs/vllm-quantization to vllm-quantization April 16, 2024 17:43

Robert Shaw added 6 commits April 22, 2024 21:18

added gemma, llama, phi

63e9e5b

deepseek

6e4e1c9

added baichuan

577ea97

added qwen

cdc523c

gpt2 work

1a5ef5f

starcoder2

bed3d83

varun-sundar-rabindranath force-pushed the rs/vllm-quantization-config-refactor-other-models branch from ab88cd5 to bed3d83 Compare April 22, 2024 21:23

varun-sundar-rabindranath merged commit cb7b2e1 into vllm-quantization Apr 22, 2024

varun-sundar-rabindranath deleted the rs/vllm-quantization-config-refactor-other-models branch April 22, 2024 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3/N] rs/vllm quantization - Adding models to non-uniform refactor#189

[3/N] rs/vllm quantization - Adding models to non-uniform refactor#189
varun-sundar-rabindranath merged 6 commits intovllm-quantizationfrom
rs/vllm-quantization-config-refactor-other-models

robertgshaw2-redhat commented Apr 14, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robertgshaw2-redhat commented Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robertgshaw2-redhat commented Apr 14, 2024 •

edited

Loading