[Core]Add Ascend Quantize by Angazenn · Pull Request #7 · vllm-project/vllm-ascend

Angazenn · 2025-02-05T13:19:45Z

This pr adds ascend quantization interface to vllm-ascend, including AscendQuantConfig class which inherits from vllm QuantizationConfig class, AscendLinearMethod class which inherits from vllm LinearMethodBase class, AscendQuantizer class that dispatches corresponding quanzation methods.

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan · 2025-02-07T08:18:07Z

vllm_ascend/ops/layernorm.py

    residual: Optional[torch.Tensor] = None,
 ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
+    if hasattr(self, "module"):
+        return self.module.forward_anti_outlier(x, residual)


does self.module only used here? If yes, how about something like:

try: from mindie_turbo import RMSNormWithAntiOutlier except: RMSNormWithAntiOutlier = None def forward_oot(): if RMSNormWithAntiOutlier is not None: return RMSNormWithAntiOutlier(self.hidden_size).forward_anti_outlier(x, residual) ....

Not sure enable_rmsnorm_with_antioutlier is need, it seems only added a new self.module there.

Details of RMSNormWithAntiOutlier are moved out of vllm_ascend. There's no need to change the implemantation of rmsnorm in vllm_ascend now.

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan · 2025-02-07T13:07:29Z

vllm_ascend/quantization/quantizer.py

+
+            return MindIETurboQuantizer.get_quantizer(quant_config)
+        except Exception:
+            raise NotImplementedError("There is no available ascend quantizer.")


please use import_lib to check if mindie_turbo is available or not. The try cache here is too large.

Yes, this should be fixed.

wangxiyuan · 2025-02-07T13:08:46Z

tests/quantization/test_mindie_turbo.py

+]
+
+
+@pytest.mark.skipif(not is_mindie_turbo_supported(),


Please add a TODO here. Once more method is available in vllm-ascend. the skip can be removed

This test case is designed for quantization methods based on mindie-turbo. For other possible quantization methods in the future, we can add new test cases.

wangxiyuan · 2025-02-07T13:10:36Z

tests/quantization/test_mindie_turbo.py

+
+    import vllm_ascend  # noqa: F401
+    from vllm_ascend.quantization.quant_config import AscendLinearMethod
+


why import inner the test?

This is because mindie_turbo should be import before vllm in early versions of mindie_turbo. Perhaps this conflict has been resolved now and these packages imported can be moved outside.

ganyi1996ppo · 2025-02-07T15:15:45Z

vllm_ascend/quantization/quantizer.py

+
+            # When not using anti-outlier algorithms, "anti_method" refers to an empty string.
+            if len(quant_config["anti_method"]) > 0:
+                enable_rmsnorm_with_antioutlier()


In my perspective, this looks kind of strange, this interface seems very detail and specific, Is it possible for you to hide more detail under the hood? I believe this part can be wrote in more general way.

Details of RMSNormWithAntiOutlier are moved out of vllm_ascend. Related codes will be hidden into mindie_turbo.

…quantizer Signed-off-by: angazenn <zengyanjia@huawei.com>

Signed-off-by: angazenn <zengyanjia@huawei.com>

vllm_ascend/quantization/quantizer.py

Signed-off-by: angazenn <zengyanjia@huawei.com>

wuhuikx · 2025-02-08T14:35:37Z

tests/quantization/test_mindie_turbo.py

+from tests.quantization.utils import is_mindie_turbo_supported, example_quantization
+
+MODELS = [
+    "/home/zyj/data/Qwen2.5-0.5B-Instruct/",


what's this path?

this is an mistake. changed to Qwen/Qwen2.5-0.5B-Instruct now

Update qwen3 moe

Long seq tmp

* epd shm Signed-off-by: wuhang <wuhang6@huawei.com> * epd shm Signed-off-by: wuhang <wuhang6@huawei.com> * epd shm Signed-off-by: wuhang <wuhang6@huawei.com> --------- Signed-off-by: wuhang <wuhang6@huawei.com> Co-authored-by: wuhang <wuhang6@huawei.com>

…_0_rc1_1227 Support A5 Qwen3 Dense w8a8 Matmul and ReduceScatter Fusion

### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`| |` vllm_ascend/quantization/quant_config.py`| |` vllm_ascend/quantization/utils.py`| |` vllm_ascend/quantization/w4a16.py`| |` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`| |` vllm_ascend/quantization/w4a8_dynamic.py`| |` vllm_ascend/quantization/w8a16.py`| |` vllm_ascend/quantization/w8a8.py`| |` vllm_ascend/quantization/w8a8_dynamic.py`| |` vllm_ascend/quantization/w8a8_pdmix.py`| |` vllm_ascend/quantization/w8a8mxfp8.py`| |` vllm_ascend/sample/rejection_sampler.py`| |` vllm_ascend/sample/sampler.py`| |` vllm_ascend/worker/block_table.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: MrZ20 <2609716663@qq.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Patch] Remove the patch of MiniCPM (vllm-project#5975) [P/D] layerwise connector support recompute scheduler (vllm-project#5900) [CI] Add workflow support for lint image build (vllm-project#6489) [Bugfix] Fix problematic dummy_run & improper input_batch_size in eagle (vllm-project#6517) [Refactor]310p_e2e test case update (vllm-project#6539) [Refactor]refactor p2p connector (vllm-project#6551) [Refactor]refactor 310p attention impl and add ut (vllm-project#6579) [Refactor]refactor 310p ops and add ut (vllm-project#6591) [Ops][Refactor] Remove custom rotary_embedding operator (vllm-project#6523) [Lint]Style: Convert `vllm-ascend/` to ruff format(new Batch vllm-project#8) (vllm-project#6604) [Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (vllm-project#5301) [CI] Fix broken CI (vllm-project#6599) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#10) (vllm-project#6173) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#11) (vllm-project#6176) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#8) (vllm-project#6129) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#7) (vllm-project#6023) [CI][Misc] Some improvement for github action (vllm-project#6587) [Image] Bump mooncake version to v0.3.8.post1 (vllm-project#6428)

…) (vllm-project#6023) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`| |` vllm_ascend/quantization/quant_config.py`| |` vllm_ascend/quantization/utils.py`| |` vllm_ascend/quantization/w4a16.py`| |` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`| |` vllm_ascend/quantization/w4a8_dynamic.py`| |` vllm_ascend/quantization/w8a16.py`| |` vllm_ascend/quantization/w8a8.py`| |` vllm_ascend/quantization/w8a8_dynamic.py`| |` vllm_ascend/quantization/w8a8_pdmix.py`| |` vllm_ascend/quantization/w8a8mxfp8.py`| |` vllm_ascend/sample/rejection_sampler.py`| |` vllm_ascend/sample/sampler.py`| |` vllm_ascend/worker/block_table.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

…) (vllm-project#6023) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`| |` vllm_ascend/quantization/quant_config.py`| |` vllm_ascend/quantization/utils.py`| |` vllm_ascend/quantization/w4a16.py`| |` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`| |` vllm_ascend/quantization/w4a8_dynamic.py`| |` vllm_ascend/quantization/w8a16.py`| |` vllm_ascend/quantization/w8a8.py`| |` vllm_ascend/quantization/w8a8_dynamic.py`| |` vllm_ascend/quantization/w8a8_pdmix.py`| |` vllm_ascend/quantization/w8a8mxfp8.py`| |` vllm_ascend/sample/rejection_sampler.py`| |` vllm_ascend/sample/sampler.py`| |` vllm_ascend/worker/block_table.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Kvcomp function

…) (vllm-project#6023) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`| |` vllm_ascend/quantization/quant_config.py`| |` vllm_ascend/quantization/utils.py`| |` vllm_ascend/quantization/w4a16.py`| |` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`| |` vllm_ascend/quantization/w4a8_dynamic.py`| |` vllm_ascend/quantization/w8a16.py`| |` vllm_ascend/quantization/w8a8.py`| |` vllm_ascend/quantization/w8a8_dynamic.py`| |` vllm_ascend/quantization/w8a8_pdmix.py`| |` vllm_ascend/quantization/w8a8mxfp8.py`| |` vllm_ascend/sample/rejection_sampler.py`| |` vllm_ascend/sample/sampler.py`| |` vllm_ascend/worker/block_table.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: MrZ20 <2609716663@qq.com>

…) (vllm-project#6023) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`| |` vllm_ascend/quantization/quant_config.py`| |` vllm_ascend/quantization/utils.py`| |` vllm_ascend/quantization/w4a16.py`| |` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`| |` vllm_ascend/quantization/w4a8_dynamic.py`| |` vllm_ascend/quantization/w8a16.py`| |` vllm_ascend/quantization/w8a8.py`| |` vllm_ascend/quantization/w8a8_dynamic.py`| |` vllm_ascend/quantization/w8a8_pdmix.py`| |` vllm_ascend/quantization/w8a8mxfp8.py`| |` vllm_ascend/sample/rejection_sampler.py`| |` vllm_ascend/sample/sampler.py`| |` vllm_ascend/worker/block_table.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…) (vllm-project#6023) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`| |` vllm_ascend/quantization/quant_config.py`| |` vllm_ascend/quantization/utils.py`| |` vllm_ascend/quantization/w4a16.py`| |` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`| |` vllm_ascend/quantization/w4a8_dynamic.py`| |` vllm_ascend/quantization/w8a16.py`| |` vllm_ascend/quantization/w8a8.py`| |` vllm_ascend/quantization/w8a8_dynamic.py`| |` vllm_ascend/quantization/w8a8_pdmix.py`| |` vllm_ascend/quantization/w8a8mxfp8.py`| |` vllm_ascend/sample/rejection_sampler.py`| |` vllm_ascend/sample/sampler.py`| |` vllm_ascend/worker/block_table.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: MrZ20 <2609716663@qq.com>

Revert "fix rope_triton"

Angazenn force-pushed the main branch from 6b98d38 to 7ea88a0 Compare February 5, 2025 13:24

Angazenn changed the base branch from main to develop February 7, 2025 01:55

Angazenn changed the title ~~Add Ascend Quantize~~ [Core]Add Ascend Quantize Feb 7, 2025

Angazenn force-pushed the main branch from c18a277 to 07c8c15 Compare February 7, 2025 02:05

angazenn added 7 commits February 7, 2025 10:08

add ascend quantize

bafc5f5

Signed-off-by: angazenn <zengyanjia@huawei.com>

add license

a4aaea4

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix quantization import bugs

43b9f70

Signed-off-by: angazenn <zengyanjia@huawei.com>

support skipping unquantized layers

46b7ca2

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix ci

5b1b34d

Signed-off-by: angazenn <zengyanjia@huawei.com>

remove unnecessary params

7a41f8f

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix import errors

d332351

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the main branch from 07c8c15 to d332351 Compare February 7, 2025 02:09

avoid import check

37c4543

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan reviewed Feb 7, 2025

View reviewed changes

angazenn added 5 commits February 7, 2025 17:18

add ascend quantization ut

7e230f0

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix quant description bugs

1bfb206

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix ut bugs

9bbc77c

Signed-off-by: angazenn <zengyanjia@huawei.com>

move quantizer initialization to linear method

b5d4bf6

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix bugs

d3dd745

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan reviewed Feb 7, 2025

View reviewed changes

ganyi1996ppo reviewed Feb 7, 2025

View reviewed changes

angazenn added 3 commits February 8, 2025 10:43

narrow down the codes that try-catch structure covers when importing …

b809659

…quantizer Signed-off-by: angazenn <zengyanjia@huawei.com>

move packages imported to the head

7f4b41c

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix import bugs

a553906

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan reviewed Feb 8, 2025

View reviewed changes

vllm_ascend/quantization/quantizer.py Show resolved Hide resolved

angazenn added 4 commits February 8, 2025 11:41

fix import bugs

3b94a9f

Signed-off-by: angazenn <zengyanjia@huawei.com>

remove anti-outlier conditions from vllm_ascend

210e6dc

Signed-off-by: angazenn <zengyanjia@huawei.com>

remove unnecessary spaces

639b602

Signed-off-by: angazenn <zengyanjia@huawei.com>

add example quantization codes to ut

a92d9fe

Signed-off-by: angazenn <zengyanjia@huawei.com>

wuhuikx reviewed Feb 10, 2025

View reviewed changes

Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Sep 12, 2025

Merge pull request vllm-project#7 from wangxiyuan/upodate_model

1e3db7a

Update qwen3 moe

LYK918 mentioned this pull request Sep 18, 2025

[Bug]: The Qwen3-235B-A22-W8A8 quantized version causes model crashes when calling tools. #3009

Open

zhangsicheng5 pushed a commit to zhangsicheng5/vllm-ascend that referenced this pull request Sep 19, 2025

Merge pull request vllm-project#7 from LookAround0301/long_seq_tmp

3ecba13

Long seq tmp

leijie-ww mentioned this pull request Oct 17, 2025

[Bug]: vllm:EngineCore process coredump while testing TextVQA dataset for both Qwen3-VL-30B-A3B-Instruct and Qwen2.5-VL-7B-Instruct #3513

Closed

menogrey mentioned this pull request Oct 23, 2025

[Bug]: Failed to run offline Pangu Pro MOE in tutorials #3563

Open

ylh19917567489-hue mentioned this pull request Nov 5, 2025

[Bug]: 量化后的qwen2.5-72B拉起服务报错 #4001

Open

MooYeh279 mentioned this pull request Nov 29, 2025

[Bug]: vllm-ascend 910B4 采用 ray 部署 GLM-4.6-w8a8 模型时，报错 #4579

Open

changdawei1 mentioned this pull request Dec 1, 2025

[Bug]: master分支aclgraph decode only测试qwen3 32B 3.5k输入长序列时，服务崩溃SelfAttentionOperation setup failed #4594

Closed

BruceFan08 mentioned this pull request Dec 4, 2025

[Bug]: Qwen3-VL-32B跑一半会报500错误 #4678

Closed

This was referenced Dec 11, 2025

[Bug]: v0.11.0rc0部署qwen3-vl-8b时崩溃（310P） #4074

Closed

Fix 310P issues in main #3779

Closed

moluzhui mentioned this pull request Jan 7, 2026

[Bug]: Ubuntu A2 DeepSeek-V3.2 SingleNode Start Error in docker container #5680

Closed

shaojun0 mentioned this pull request Jan 8, 2026

[Bug]: Ubuntu A2 DeepSeek-V3.2 SingleNode Start Error in docker container #5684

Open

ksiyuan pushed a commit to ksiyuan/vllm-ascend that referenced this pull request Jan 17, 2026

Merge pull request vllm-project#7 from lenghuixing0330/br_a5_poc_0_13…

955a575

…_0_rc1_1227 Support A5 Qwen3 Dense w8a8 Matmul and ReduceScatter Fusion

shilongx mentioned this pull request Jan 19, 2026

[Doc]: How to develop to Deepseek V3.2-w8a8 on A2 * 64G *8 * node2 ? #5461

Open

luluxiu520 mentioned this pull request Jan 30, 2026

[Usage]: vllm 0.14.RC1镜像800IA2双机拉起deepseek-V3.2报NPU function error: call aclnnMoeDistributeDispatchV4 failed, error code is 561000错误，回退到0.13.RC1则正常 #6398

Closed

wangxiyuan mentioned this pull request Feb 24, 2026

[Misc]: test #6787

Closed

fengle-great mentioned this pull request Feb 27, 2026

[Bug]: 310P 使用quay.io/ascend/vllm-ascend:v0.14.0rc1-310p-openeuler部署qwen3-vl时报错 #6842

Closed

leideng pushed a commit to leideng/vllm-ascend that referenced this pull request Mar 2, 2026

Merge pull request vllm-project#7 from linsheng1/kvcomp_b030

933d9a9

Kvcomp function

liuchenbing2026 pushed a commit to liuchenbing2026/vllm-ascend that referenced this pull request Mar 10, 2026

Merge pull request vllm-project#7 from yydyzr/revert-6-pr_rope

59184b0

Revert "fix rope_triton"

WillCheny mentioned this pull request Mar 16, 2026

[Bug]: Qwen3.5-27B，A2单机8卡v0.17.0rc1镜像部署失败 #7310

Open

1579890249 mentioned this pull request Mar 24, 2026

[Bug]: Error occurred in the 910B3 dual-machine reasoning Qwen3.5-397B-A17B-w8a8-mtp. #7553

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core]Add Ascend Quantize#7

[Core]Add Ascend Quantize#7
wangxiyuan merged 22 commits intovllm-project:developfrom
Angazenn:main

Angazenn commented Feb 5, 2025

Uh oh!

wangxiyuan Feb 7, 2025 •

edited

Loading

Uh oh!

Angazenn Feb 8, 2025 •

edited

Loading

Uh oh!

wangxiyuan Feb 7, 2025

Uh oh!

Angazenn Feb 7, 2025

Uh oh!

wangxiyuan Feb 7, 2025

Uh oh!

Angazenn Feb 7, 2025

Uh oh!

wangxiyuan Feb 7, 2025

Uh oh!

Angazenn Feb 7, 2025

Uh oh!

ganyi1996ppo Feb 7, 2025

Uh oh!

Angazenn Feb 8, 2025

Uh oh!

Uh oh!

wuhuikx Feb 8, 2025

Uh oh!

Angazenn Feb 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		import vllm_ascend # noqa: F401
		from vllm_ascend.quantization.quant_config import AscendLinearMethod

		]


		@pytest.mark.skipif(not is_mindie_turbo_supported(),

Conversation

Angazenn commented Feb 5, 2025

Uh oh!

wangxiyuan Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Angazenn Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangxiyuan Feb 7, 2025 •

edited

Loading

Angazenn Feb 8, 2025 •

edited

Loading