Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

[3/N] rs/vllm quantization - Adding models to non-uniform refactor#189

Merged
varun-sundar-rabindranath merged 6 commits intovllm-quantizationfrom
rs/vllm-quantization-config-refactor-other-models
Apr 22, 2024
Merged

[3/N] rs/vllm quantization - Adding models to non-uniform refactor#189
varun-sundar-rabindranath merged 6 commits intovllm-quantizationfrom
rs/vllm-quantization-config-refactor-other-models

Conversation

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Apr 14, 2024

Since we changes the LinearMethod interface to require layer_name, we need to update each model.py to plumb this information through the models. We need to do this, because we need to pass the layer_name to LinearMethodBase.create_weights, such that we have have non-uniform quantization / compression (as we need to be able to consult the quantization config to determine what the weights / format should look like and we use the layer name to decide this

So far, have updated:

  • llama
  • gemma
  • phi-2
  • gpt2
  • starcoder2
  • qwen2
  • deepseek and deepseekMoE
  • baichuan

To test:

python3 examples/simple_test.py --help

To Update:

  • Pass layer_name to QKVParallelLinear, MergedColumnParallelLinear, ColumnParallelLinear, RowParallelLinear by plumbing parent_name through from Model --> DecoderLayer --> MLP / SelfAttention --> Layer
  • Updated weight_loader with linear_method.maybe_update_name

@robertgshaw2-redhat robertgshaw2-redhat marked this pull request as ready for review April 14, 2024 02:38
@robertgshaw2-redhat robertgshaw2-redhat changed the title Slowly Updating Models For Refactor [3/N] rs/vllm-quantization - Adding Models For Refactor Apr 14, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [3/N] rs/vllm-quantization - Adding Models For Refactor [3/N] rs/vllm quantization - Adding Models For Refactor Apr 14, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [3/N] rs/vllm quantization - Adding Models For Refactor [3/N] rs/vllm quantization - Adding models to non-uniform refactor Apr 14, 2024
Base automatically changed from rs/vllm-quantization-config-refactor to rs/vllm-quantization April 16, 2024 17:32
Base automatically changed from rs/vllm-quantization to vllm-quantization April 16, 2024 17:43
@varun-sundar-rabindranath varun-sundar-rabindranath force-pushed the rs/vllm-quantization-config-refactor-other-models branch from ab88cd5 to bed3d83 Compare April 22, 2024 21:23
@varun-sundar-rabindranath varun-sundar-rabindranath merged commit cb7b2e1 into vllm-quantization Apr 22, 2024
@varun-sundar-rabindranath varun-sundar-rabindranath deleted the rs/vllm-quantization-config-refactor-other-models branch April 22, 2024 21:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants