Skip to content

[Feature] Add quant description file for new quant model generated by modelslim#719

Merged
Yikun merged 1 commit intovllm-project:mainfrom
ganyi1996ppo:ganyi/quant_support
Apr 30, 2025
Merged

[Feature] Add quant description file for new quant model generated by modelslim#719
Yikun merged 1 commit intovllm-project:mainfrom
ganyi1996ppo:ganyi/quant_support

Conversation

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it?

After discussed with MindStudio about the quantization model format, we decide to support another quant format which may used in new modelslim tool, in which case, quantization_config may be removed from the config.json file and quant_model_description.json will be used for quantization configuration.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo ganyi1996ppo changed the title add quant description file for new quant model generated by modelslim [Feature] Add quant description file for new quant model generated by modelslim Apr 29, 2025
@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

@Yikun @wangxiyuan This PR is plan to support the new quant model format generated by the modelslim, please review. BTW, please not merge this PR now, we haven't test the functionality, still needs MindStudio's feedback on this.

Copy link
Copy Markdown
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change. And I think this change is backward compatible. The only problem is to make sure the json file name is named as design.

Copy link
Copy Markdown
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait modelslim feedback and merge, and seems v0.7.3 also need this (via mindie-turbo), right?

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

ok, I'll cherry-pick those change into v0.7.3 as long as the verification is done by modelslim

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

Verified on w8a8 model with new config generated by modelslim, the result looks good.

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

One thing needs to note here, in order to keep compatibility for different version of quantization model generated by modelslim, we should pass the LLM with quantization=ascend by default to enable the quantizaiton feature supported in vllm-ascend, otherwise model load error is expected during vllm engine initialization phase. Below code is a simple example:

import os

from vllm import LLM, SamplingParams

if __name__ == "__main__":
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
    # Create an LLM.
    llm = LLM(model="qwen2.5_72b_w8a8",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=1024,
              # needs this key word argument tell engine to run quantization path
              quantization="ascend")

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

@Yikun
Copy link
Copy Markdown
Member

Yikun commented Apr 30, 2025

LGTM, would you mind also updating the https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_quantization.html if modelslim has a tag.

@Yikun Yikun merged commit 3a62889 into vllm-project:main Apr 30, 2025
14 checks passed
Yikun pushed a commit that referenced this pull request Apr 30, 2025
### What this PR does / why we need it?

In order to support quantization model generated by new modelslim
version, we need to add quant_description in `AscendQuantConfig`.
cherry-pick this PR from
#719

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
 test locally

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
… modelslim (vllm-project#719)

### What this PR does / why we need it?
After discussed with MindStudio about the quantization model format, we
decide to support another quant format which may used in new modelslim
tool, in which case, `quantization_config` may be removed from the
`config.json` file and `quant_model_description.json` will be used for
quantization configuration.
### Does this PR introduce _any_ user-facing change?
Yes, using the latest quantization format

### How was this patch tested?
Test locally

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
… modelslim (vllm-project#719)

### What this PR does / why we need it?
After discussed with MindStudio about the quantization model format, we
decide to support another quant format which may used in new modelslim
tool, in which case, `quantization_config` may be removed from the
`config.json` file and `quant_model_description.json` will be used for
quantization configuration.
### Does this PR introduce _any_ user-facing change?
Yes, using the latest quantization format

### How was this patch tested?
Test locally

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants