add dispath_ffn_combine_bf16#5866
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new operator dispatch_ffn_combine_bf16 for Mixture-of-Experts models on the CANN platform. The changes are extensive, covering operator definition, host and device-side implementations, PyTorch bindings, and tests. However, I've identified several critical issues related to correctness and potential runtime failures, including incorrect template instantiations, wrong PyTorch bindings, potential buffer overflows due to fixed-size arrays, and incomplete operator prototype implementations. These issues must be addressed to ensure the operator functions correctly and safely.
e729497 to
0cb311a
Compare
Signed-off-by: guanguan0308 <1546542263@qq.com>
175ce0b to
d561470
Compare
Signed-off-by: guanguan0308 <1546542263@qq.com>
12d5c9d to
e4e76cd
Compare
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (24 commits) add dispath_ffn_combine_bf16 (vllm-project#5866) [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (vllm-project#5932) [1/N][Feat] Xlite Qwen3 MoE Support (vllm-project#5951) [Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5945) [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (vllm-project#5132) [Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6049) 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6045) [main][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5967) [Feature]refactor the npugraph_ex config, support online-infer with static kernel (vllm-project#5775) [CI][Lint] Show lint diff on failure (vllm-project#5956) [CI] Add wait logic for each individual case (vllm-project#6036) [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (vllm-project#4633) model runner v2 support triton of penalty (vllm-project#5854) [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (vllm-project#6034) [Tests] move qwen3 performance test from nightly to e2e (vllm-project#5980) [Bugfix] fix bug of pcp+mtp+async scheduler (vllm-project#5994) [Main2Main] Upgrade vllm commit to releases/v0.14.0 (vllm-project#5988) [Ops] Add layernorm for qwen3Next (vllm-project#5765) [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (vllm-project#5921) ...
### What this PR does / why we need it? add dispath_ffn_combine_bf16 - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@bde38c1 --------- Signed-off-by: guanguan0308 <1546542263@qq.com> Signed-off-by: huangning1995 <huangning12@huawei.com>
This reverts commit 2073197.
### What this PR does / why we need it? add dispath_ffn_combine_bf16 - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@bde38c1 --------- Signed-off-by: guanguan0308 <1546542263@qq.com>
### What this PR does / why we need it? add dispath_ffn_combine_bf16 - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@bde38c1 --------- Signed-off-by: guanguan0308 <1546542263@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? add dispath_ffn_combine_bf16 - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@bde38c1 --------- Signed-off-by: guanguan0308 <1546542263@qq.com>
### What this PR does / why we need it? add dispath_ffn_combine_bf16 - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@bde38c1 --------- Signed-off-by: guanguan0308 <1546542263@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? add dispath_ffn_combine_bf16 - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@bde38c1 --------- Signed-off-by: guanguan0308 <1546542263@qq.com>
What this PR does / why we need it?
add dispath_ffn_combine_bf16
Does this PR introduce any user-facing change?
How was this patch tested?