[Model] Add LongCat-Flash#3833
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request adds support for the LongCat-Flash model. The changes span across attention mechanisms, Mixture-of-Experts (MoE) layers, quantization configurations, and the model runner to accommodate the specific architecture of this new model. My review focuses on ensuring the changes are robust and maintainable. I've identified a critical issue related to a side effect on a shared configuration object, which could lead to unpredictable behavior, and a high-severity issue regarding brittle string parsing for layer indexing.
| if hf_config.model_type == "longcat_flash": | ||
| self.debug_layer_idx = int(self.prefix.split(".")[2]) | ||
| hf_config.first_k_dense_replace = 0 | ||
| else: | ||
| self.debug_layer_idx = int(self.prefix.split(".")[-2]) | ||
| self.first_k_dense_replace = hf_config.first_k_dense_replace |
There was a problem hiding this comment.
Modifying the shared hf_config object directly is a dangerous side effect that can lead to unpredictable behavior in other parts of the application. Configuration objects should be treated as immutable within model layers.
To fix this, you should set the self.first_k_dense_replace attribute based on the condition, without altering hf_config.
| if hf_config.model_type == "longcat_flash": | |
| self.debug_layer_idx = int(self.prefix.split(".")[2]) | |
| hf_config.first_k_dense_replace = 0 | |
| else: | |
| self.debug_layer_idx = int(self.prefix.split(".")[-2]) | |
| self.first_k_dense_replace = hf_config.first_k_dense_replace | |
| if hf_config.model_type == "longcat_flash": | |
| self.debug_layer_idx = int(self.prefix.split(".")[2]) | |
| self.first_k_dense_replace = 0 | |
| else: | |
| self.debug_layer_idx = int(self.prefix.split(".")[-2]) | |
| self.first_k_dense_replace = hf_config.first_k_dense_replace |
| ).enable_shared_expert_dp | ||
| self.debug_layer_idx = int(self.prefix.split(".")[-2]) | ||
| if hf_config.model_type == "longcat_flash": | ||
| self.debug_layer_idx = int(self.prefix.split(".")[2]) |
There was a problem hiding this comment.
Using a hardcoded index [2] to parse the layer index from the prefix string is brittle and assumes a fixed prefix structure (e.g., model.layers.{idx}.<...>). This can easily break if the model's naming convention or the prefix structure changes. A more robust approach would be to use regular expressions or a more structured method to extract the layer index, which would make the code more resilient to future changes.
Signed-off-by: chuyuelin <923822139@qq.com>
3b20170 to
d3e5e6a
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
…flash # Conflicts: # vllm_ascend/worker/model_runner_v1.py
Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1e267ed to
1dc8edf
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
98bc551 to
fae5ee5
Compare
# Conflicts: # vllm_ascend/ops/fused_moe/experts_selector.py # vllm_ascend/ops/fused_moe/fused_moe.py # vllm_ascend/ops/rotary_embedding.py # vllm_ascend/quantization/w8a8.py # vllm_ascend/quantization/w8a8_dynamic.py # vllm_ascend/worker/model_runner_v1.py
83db2dd to
291dd0a
Compare
Signed-off-by: chuyuelin <923822139@qq.com>
291dd0a to
3429aca
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
a9e4f8b to
6fa7984
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
309571e to
c7d7174
Compare
# Conflicts: # vllm_ascend/attention/mla_v1.py # vllm_ascend/ops/fused_moe/fused_moe.py Signed-off-by: chuyuelin <923822139@qq.com>
c7d7174 to
3b87b48
Compare
### What this PR does / why we need it? Add LongCat-Flash support. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: chuyuelin <923822139@qq.com> Co-authored-by: chuyuelin <chuyuelin1@huawei.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>
### What this PR does / why we need it? Add LongCat-Flash support. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: chuyuelin <923822139@qq.com> Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
### What this PR does / why we need it? Add LongCat-Flash support. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: chuyuelin <923822139@qq.com> Co-authored-by: chuyuelin <chuyuelin1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Add LongCat-Flash support. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: chuyuelin <923822139@qq.com> Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
### What this PR does / why we need it? Add LongCat-Flash support. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: chuyuelin <923822139@qq.com> Co-authored-by: chuyuelin <chuyuelin1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
Add LongCat-Flash support.
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
CI passed