Skip to content

[Core]Add Ascend Quantize#7

Merged
wangxiyuan merged 22 commits intovllm-project:developfrom
Angazenn:main
Feb 11, 2025
Merged

[Core]Add Ascend Quantize#7
wangxiyuan merged 22 commits intovllm-project:developfrom
Angazenn:main

Conversation

@Angazenn
Copy link
Collaborator

@Angazenn Angazenn commented Feb 5, 2025

This pr adds ascend quantization interface to vllm-ascend, including AscendQuantConfig class which inherits from vllm QuantizationConfig class, AscendLinearMethod class which inherits from vllm LinearMethodBase class, AscendQuantizer class that dispatches corresponding quanzation methods.

@Angazenn Angazenn changed the base branch from main to develop February 7, 2025 01:55
@Angazenn Angazenn changed the title Add Ascend Quantize [Core]Add Ascend Quantize Feb 7, 2025
angazenn added 7 commits February 7, 2025 10:08
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
residual: Optional[torch.Tensor] = None,
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
if hasattr(self, "module"):
return self.module.forward_anti_outlier(x, residual)
Copy link
Collaborator

@wangxiyuan wangxiyuan Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does self.module only used here? If yes, how about something like:

try:
    from mindie_turbo import RMSNormWithAntiOutlier
except:
    RMSNormWithAntiOutlier = None


def forward_oot():
    if RMSNormWithAntiOutlier is not None:
        return RMSNormWithAntiOutlier(self.hidden_size).forward_anti_outlier(x, residual)
  ....

Not sure enable_rmsnorm_with_antioutlier is need, it seems only added a new self.module there.

Copy link
Collaborator Author

@Angazenn Angazenn Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details of RMSNormWithAntiOutlier are moved out of vllm_ascend. There's no need to change the implemantation of rmsnorm in vllm_ascend now.

angazenn added 5 commits February 7, 2025 17:18
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>

return MindIETurboQuantizer.get_quantizer(quant_config)
except Exception:
raise NotImplementedError("There is no available ascend quantizer.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use import_lib to check if mindie_turbo is available or not. The try cache here is too large.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be fixed.

]


@pytest.mark.skipif(not is_mindie_turbo_supported(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO here. Once more method is available in vllm-ascend. the skip can be removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is designed for quantization methods based on mindie-turbo. For other possible quantization methods in the future, we can add new test cases.


import vllm_ascend # noqa: F401
from vllm_ascend.quantization.quant_config import AscendLinearMethod

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why import inner the test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because mindie_turbo should be import before vllm in early versions of mindie_turbo. Perhaps this conflict has been resolved now and these packages imported can be moved outside.


# When not using anti-outlier algorithms, "anti_method" refers to an empty string.
if len(quant_config["anti_method"]) > 0:
enable_rmsnorm_with_antioutlier()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my perspective, this looks kind of strange, this interface seems very detail and specific, Is it possible for you to hide more detail under the hood? I believe this part can be wrote in more general way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details of RMSNormWithAntiOutlier are moved out of vllm_ascend. Related codes will be hidden into mindie_turbo.

angazenn added 3 commits February 8, 2025 10:43
…quantizer

Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
angazenn added 4 commits February 8, 2025 11:41
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
from tests.quantization.utils import is_mindie_turbo_supported, example_quantization

MODELS = [
"/home/zyj/data/Qwen2.5-0.5B-Instruct/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an mistake. changed to Qwen/Qwen2.5-0.5B-Instruct now

Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Sep 12, 2025
zhangsicheng5 pushed a commit to zhangsicheng5/vllm-ascend that referenced this pull request Sep 19, 2025
wuhang2014 added a commit to wuhang2014/vllm-ascend that referenced this pull request Sep 28, 2025
* epd shm

Signed-off-by: wuhang <wuhang6@huawei.com>

* epd shm

Signed-off-by: wuhang <wuhang6@huawei.com>

* epd shm

Signed-off-by: wuhang <wuhang6@huawei.com>

---------

Signed-off-by: wuhang <wuhang6@huawei.com>
Co-authored-by: wuhang <wuhang6@huawei.com>
ksiyuan pushed a commit to ksiyuan/vllm-ascend that referenced this pull request Jan 17, 2026
…_0_rc1_1227

Support A5 Qwen3 Dense w8a8 Matmul and ReduceScatter Fusion
wangxiyuan pushed a commit that referenced this pull request Feb 6, 2026
### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`|
|` vllm_ascend/quantization/quant_config.py`|
|` vllm_ascend/quantization/utils.py`|
|` vllm_ascend/quantization/w4a16.py`|
|` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`|
|` vllm_ascend/quantization/w4a8_dynamic.py`|
|` vllm_ascend/quantization/w8a16.py`|
|` vllm_ascend/quantization/w8a8.py`|
|` vllm_ascend/quantization/w8a8_dynamic.py`|
|` vllm_ascend/quantization/w8a8_pdmix.py`|
|` vllm_ascend/quantization/w8a8mxfp8.py`|
|` vllm_ascend/sample/rejection_sampler.py`|
|` vllm_ascend/sample/sampler.py`|
|` vllm_ascend/worker/block_table.py`|

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: MrZ20 <2609716663@qq.com>
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Feb 9, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [Patch] Remove the patch of MiniCPM (vllm-project#5975)
  [P/D] layerwise connector support recompute scheduler (vllm-project#5900)
  [CI] Add workflow support for lint image build (vllm-project#6489)
  [Bugfix] Fix problematic dummy_run & improper input_batch_size in eagle (vllm-project#6517)
  [Refactor]310p_e2e test case update (vllm-project#6539)
  [Refactor]refactor p2p connector (vllm-project#6551)
  [Refactor]refactor 310p attention impl and add ut (vllm-project#6579)
  [Refactor]refactor 310p ops and add ut (vllm-project#6591)
  [Ops][Refactor] Remove custom rotary_embedding operator (vllm-project#6523)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(new Batch vllm-project#8) (vllm-project#6604)
  [Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder  (vllm-project#5301)
  [CI] Fix broken CI (vllm-project#6599)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#10) (vllm-project#6173)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#11) (vllm-project#6176)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#8) (vllm-project#6129)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#7) (vllm-project#6023)
  [CI][Misc] Some improvement for github action (vllm-project#6587)
  [Image] Bump mooncake version to v0.3.8.post1 (vllm-project#6428)
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
…) (vllm-project#6023)

### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`|
|` vllm_ascend/quantization/quant_config.py`|
|` vllm_ascend/quantization/utils.py`|
|` vllm_ascend/quantization/w4a16.py`|
|` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`|
|` vllm_ascend/quantization/w4a8_dynamic.py`|
|` vllm_ascend/quantization/w8a16.py`|
|` vllm_ascend/quantization/w8a8.py`|
|` vllm_ascend/quantization/w8a8_dynamic.py`|
|` vllm_ascend/quantization/w8a8_pdmix.py`|
|` vllm_ascend/quantization/w8a8mxfp8.py`|
|` vllm_ascend/sample/rejection_sampler.py`|
|` vllm_ascend/sample/sampler.py`|
|` vllm_ascend/worker/block_table.py`|

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
@wangxiyuan wangxiyuan mentioned this pull request Feb 24, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…) (vllm-project#6023)

### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`|
|` vllm_ascend/quantization/quant_config.py`|
|` vllm_ascend/quantization/utils.py`|
|` vllm_ascend/quantization/w4a16.py`|
|` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`|
|` vllm_ascend/quantization/w4a8_dynamic.py`|
|` vllm_ascend/quantization/w8a16.py`|
|` vllm_ascend/quantization/w8a8.py`|
|` vllm_ascend/quantization/w8a8_dynamic.py`|
|` vllm_ascend/quantization/w8a8_pdmix.py`|
|` vllm_ascend/quantization/w8a8mxfp8.py`|
|` vllm_ascend/sample/rejection_sampler.py`|
|` vllm_ascend/sample/sampler.py`|
|` vllm_ascend/worker/block_table.py`|

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
leideng pushed a commit to leideng/vllm-ascend that referenced this pull request Mar 2, 2026
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…) (vllm-project#6023)

### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`|
|` vllm_ascend/quantization/quant_config.py`|
|` vllm_ascend/quantization/utils.py`|
|` vllm_ascend/quantization/w4a16.py`|
|` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`|
|` vllm_ascend/quantization/w4a8_dynamic.py`|
|` vllm_ascend/quantization/w8a16.py`|
|` vllm_ascend/quantization/w8a8.py`|
|` vllm_ascend/quantization/w8a8_dynamic.py`|
|` vllm_ascend/quantization/w8a8_pdmix.py`|
|` vllm_ascend/quantization/w8a8mxfp8.py`|
|` vllm_ascend/sample/rejection_sampler.py`|
|` vllm_ascend/sample/sampler.py`|
|` vllm_ascend/worker/block_table.py`|

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: MrZ20 <2609716663@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…) (vllm-project#6023)

### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`|
|` vllm_ascend/quantization/quant_config.py`|
|` vllm_ascend/quantization/utils.py`|
|` vllm_ascend/quantization/w4a16.py`|
|` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`|
|` vllm_ascend/quantization/w4a8_dynamic.py`|
|` vllm_ascend/quantization/w8a16.py`|
|` vllm_ascend/quantization/w8a8.py`|
|` vllm_ascend/quantization/w8a8_dynamic.py`|
|` vllm_ascend/quantization/w8a8_pdmix.py`|
|` vllm_ascend/quantization/w8a8mxfp8.py`|
|` vllm_ascend/sample/rejection_sampler.py`|
|` vllm_ascend/sample/sampler.py`|
|` vllm_ascend/worker/block_table.py`|

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…) (vllm-project#6023)

### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|` vllm_ascend/quantization/compressed_tensors/compressed_tensors.py`|
|` vllm_ascend/quantization/quant_config.py`|
|` vllm_ascend/quantization/utils.py`|
|` vllm_ascend/quantization/w4a16.py`|
|` vllm_ascend/quantization/w4a4_flatquant_dynamic.py`|
|` vllm_ascend/quantization/w4a8_dynamic.py`|
|` vllm_ascend/quantization/w8a16.py`|
|` vllm_ascend/quantization/w8a8.py`|
|` vllm_ascend/quantization/w8a8_dynamic.py`|
|` vllm_ascend/quantization/w8a8_pdmix.py`|
|` vllm_ascend/quantization/w8a8mxfp8.py`|
|` vllm_ascend/sample/rejection_sampler.py`|
|` vllm_ascend/sample/sampler.py`|
|` vllm_ascend/worker/block_table.py`|

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: MrZ20 <2609716663@qq.com>
liuchenbing2026 pushed a commit to liuchenbing2026/vllm-ascend that referenced this pull request Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants