[Feature] Add quant description file for new quant model generated by modelslim#719
Conversation
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
|
@Yikun @wangxiyuan This PR is plan to support the new quant model format generated by the modelslim, please review. BTW, please not merge this PR now, we haven't test the functionality, still needs MindStudio's feedback on this. |
wangxiyuan
left a comment
There was a problem hiding this comment.
I'm fine with this change. And I think this change is backward compatible. The only problem is to make sure the json file name is named as design.
|
ok, I'll cherry-pick those change into v0.7.3 as long as the verification is done by modelslim |
|
Verified on w8a8 model with new config generated by modelslim, the result looks good. |
|
One thing needs to note here, in order to keep compatibility for different version of quantization model generated by modelslim, we should pass the LLM with import os
from vllm import LLM, SamplingParams
if __name__ == "__main__":
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="qwen2.5_72b_w8a8",
tensor_parallel_size=4,
enforce_eager=True,
trust_remote_code=True,
max_model_len=1024,
# needs this key word argument tell engine to run quantization path
quantization="ascend")
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") |
|
LGTM, would you mind also updating the https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_quantization.html if modelslim has a tag. |
### What this PR does / why we need it? In order to support quantization model generated by new modelslim version, we need to add quant_description in `AscendQuantConfig`. cherry-pick this PR from #719 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
… modelslim (vllm-project#719) ### What this PR does / why we need it? After discussed with MindStudio about the quantization model format, we decide to support another quant format which may used in new modelslim tool, in which case, `quantization_config` may be removed from the `config.json` file and `quant_model_description.json` will be used for quantization configuration. ### Does this PR introduce _any_ user-facing change? Yes, using the latest quantization format ### How was this patch tested? Test locally Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
… modelslim (vllm-project#719) ### What this PR does / why we need it? After discussed with MindStudio about the quantization model format, we decide to support another quant format which may used in new modelslim tool, in which case, `quantization_config` may be removed from the `config.json` file and `quant_model_description.json` will be used for quantization configuration. ### Does this PR introduce _any_ user-facing change? Yes, using the latest quantization format ### How was this patch tested? Test locally Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
What this PR does / why we need it?
After discussed with MindStudio about the quantization model format, we decide to support another quant format which may used in new modelslim tool, in which case,
quantization_configmay be removed from theconfig.jsonfile andquant_model_description.jsonwill be used for quantization configuration.Does this PR introduce any user-facing change?
How was this patch tested?