[Feature] Add quant description file for new quant model generated by modelslim by ganyi1996ppo · Pull Request #719 · vllm-project/vllm-ascend

ganyi1996ppo · 2025-04-29T06:31:29Z

What this PR does / why we need it?

After discussed with MindStudio about the quantization model format, we decide to support another quant format which may used in new modelslim tool, in which case, quantization_config may be removed from the config.json file and quant_model_description.json will be used for quantization configuration.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

ganyi1996ppo · 2025-04-29T06:57:35Z

@Yikun @wangxiyuan This PR is plan to support the new quant model format generated by the modelslim, please review. BTW, please not merge this PR now, we haven't test the functionality, still needs MindStudio's feedback on this.

wangxiyuan

I'm fine with this change. And I think this change is backward compatible. The only problem is to make sure the json file name is named as design.

Yikun

Let's wait modelslim feedback and merge, and seems v0.7.3 also need this (via mindie-turbo), right?

ganyi1996ppo · 2025-04-30T02:05:43Z

ok, I'll cherry-pick those change into v0.7.3 as long as the verification is done by modelslim

ganyi1996ppo · 2025-04-30T08:37:29Z

Verified on w8a8 model with new config generated by modelslim, the result looks good.

ganyi1996ppo · 2025-04-30T08:44:54Z

One thing needs to note here, in order to keep compatibility for different version of quantization model generated by modelslim, we should pass the LLM with quantization=ascend by default to enable the quantizaiton feature supported in vllm-ascend, otherwise model load error is expected during vllm engine initialization phase. Below code is a simple example:

import os

from vllm import LLM, SamplingParams

if __name__ == "__main__":
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
    # Create an LLM.
    llm = LLM(model="qwen2.5_72b_w8a8",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=1024,
              # needs this key word argument tell engine to run quantization path
              quantization="ascend")

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Yikun · 2025-04-30T08:51:03Z

LGTM, would you mind also updating the https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_quantization.html if modelslim has a tag.

### What this PR does / why we need it? In order to support quantization model generated by new modelslim version, we need to add quant_description in `AscendQuantConfig`. cherry-pick this PR from #719 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

… modelslim (vllm-project#719) ### What this PR does / why we need it? After discussed with MindStudio about the quantization model format, we decide to support another quant format which may used in new modelslim tool, in which case, `quantization_config` may be removed from the `config.json` file and `quant_model_description.json` will be used for quantization configuration. ### Does this PR introduce _any_ user-facing change? Yes, using the latest quantization format ### How was this patch tested? Test locally Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

add quant description file for new quant model generated by modelslim

1c5b8d5

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

github-actions Bot added the module:quantization label Apr 29, 2025

ganyi1996ppo changed the title ~~add quant description file for new quant model generated by modelslim~~ [Feature] Add quant description file for new quant model generated by modelslim Apr 29, 2025

wangxiyuan approved these changes Apr 29, 2025

View reviewed changes

Yikun approved these changes Apr 30, 2025

View reviewed changes

Yikun merged commit 3a62889 into vllm-project:main Apr 30, 2025
14 checks passed

ganyi1996ppo mentioned this pull request Apr 30, 2025

new modelslim quantization model support v0.7.3 #743

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add quant description file for new quant model generated by modelslim#719

[Feature] Add quant description file for new quant model generated by modelslim#719
Yikun merged 1 commit intovllm-project:mainfrom
ganyi1996ppo:ganyi/quant_support

ganyi1996ppo commented Apr 29, 2025

Uh oh!

ganyi1996ppo commented Apr 29, 2025

Uh oh!

wangxiyuan left a comment

Uh oh!

Yikun left a comment •

edited

Loading

Uh oh!

ganyi1996ppo commented Apr 30, 2025

Uh oh!

ganyi1996ppo commented Apr 30, 2025

Uh oh!

ganyi1996ppo commented Apr 30, 2025

Uh oh!

Yikun commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ganyi1996ppo commented Apr 29, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ganyi1996ppo commented Apr 29, 2025

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Yikun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Apr 30, 2025

Uh oh!

ganyi1996ppo commented Apr 30, 2025

Uh oh!

ganyi1996ppo commented Apr 30, 2025

Uh oh!

Yikun commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yikun left a comment •

edited

Loading