Skip to content

[Model] Add LongCat-Flash#3833

Merged
wangxiyuan merged 13 commits intovllm-project:mainfrom
chuyuelin:longcat_flash
Dec 31, 2025
Merged

[Model] Add LongCat-Flash#3833
wangxiyuan merged 13 commits intovllm-project:mainfrom
chuyuelin:longcat_flash

Conversation

@chuyuelin
Copy link
Copy Markdown
Contributor

@chuyuelin chuyuelin commented Oct 28, 2025

What this PR does / why we need it?

Add LongCat-Flash support.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the LongCat-Flash model. The changes span across attention mechanisms, Mixture-of-Experts (MoE) layers, quantization configurations, and the model runner to accommodate the specific architecture of this new model. My review focuses on ensuring the changes are robust and maintainable. I've identified a critical issue related to a side effect on a shared configuration object, which could lead to unpredictable behavior, and a high-severity issue regarding brittle string parsing for layer indexing.

Comment thread vllm_ascend/models/layers/mla.py Outdated
Comment on lines 85 to 90
if hf_config.model_type == "longcat_flash":
self.debug_layer_idx = int(self.prefix.split(".")[2])
hf_config.first_k_dense_replace = 0
else:
self.debug_layer_idx = int(self.prefix.split(".")[-2])
self.first_k_dense_replace = hf_config.first_k_dense_replace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Modifying the shared hf_config object directly is a dangerous side effect that can lead to unpredictable behavior in other parts of the application. Configuration objects should be treated as immutable within model layers.

To fix this, you should set the self.first_k_dense_replace attribute based on the condition, without altering hf_config.

Suggested change
if hf_config.model_type == "longcat_flash":
self.debug_layer_idx = int(self.prefix.split(".")[2])
hf_config.first_k_dense_replace = 0
else:
self.debug_layer_idx = int(self.prefix.split(".")[-2])
self.first_k_dense_replace = hf_config.first_k_dense_replace
if hf_config.model_type == "longcat_flash":
self.debug_layer_idx = int(self.prefix.split(".")[2])
self.first_k_dense_replace = 0
else:
self.debug_layer_idx = int(self.prefix.split(".")[-2])
self.first_k_dense_replace = hf_config.first_k_dense_replace

Comment thread vllm_ascend/models/layers/mla.py Outdated
).enable_shared_expert_dp
self.debug_layer_idx = int(self.prefix.split(".")[-2])
if hf_config.model_type == "longcat_flash":
self.debug_layer_idx = int(self.prefix.split(".")[2])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a hardcoded index [2] to parse the layer index from the prefix string is brittle and assumes a fixed prefix structure (e.g., model.layers.{idx}.<...>). This can easily break if the model's naming convention or the prefix structure changes. A more robust approach would be to use regular expressions or a more structured method to extract the layer index, which would make the code more resilient to future changes.

Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
@Angazenn Angazenn added ready read for review ready-for-test start test by label for PR labels Oct 29, 2025
Comment thread vllm_ascend/ops/fused_moe/experts_selector.py Outdated
Comment thread vllm_ascend/models/layers/mla.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…flash

# Conflicts:
#	vllm_ascend/worker/model_runner_v1.py
Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

# Conflicts:
#	vllm_ascend/ops/fused_moe/experts_selector.py
#	vllm_ascend/ops/fused_moe/fused_moe.py
#	vllm_ascend/ops/rotary_embedding.py
#	vllm_ascend/quantization/w8a8.py
#	vllm_ascend/quantization/w8a8_dynamic.py
#	vllm_ascend/worker/model_runner_v1.py
@chuyuelin chuyuelin force-pushed the longcat_flash branch 2 times, most recently from 83db2dd to 291dd0a Compare December 25, 2025 03:17
Signed-off-by: chuyuelin <923822139@qq.com>
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@chuyuelin chuyuelin force-pushed the longcat_flash branch 2 times, most recently from 309571e to c7d7174 Compare December 30, 2025 03:21
# Conflicts:
#	vllm_ascend/attention/mla_v1.py
#	vllm_ascend/ops/fused_moe/fused_moe.py

Signed-off-by: chuyuelin <923822139@qq.com>
Signed-off-by: chuyuelin <923822139@qq.com>
@wangxiyuan wangxiyuan merged commit d07d8a4 into vllm-project:main Dec 31, 2025
19 checks passed
@chuyuelin chuyuelin deleted the longcat_flash branch January 4, 2026 01:25
wjunLu pushed a commit to wjunLu/vllm-ascend that referenced this pull request Jan 4, 2026
### What this PR does / why we need it?
Add LongCat-Flash support.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: chuyuelin <923822139@qq.com>
Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
Signed-off-by: wjunLu <wjunlu217@gmail.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
### What this PR does / why we need it?
Add LongCat-Flash support.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: chuyuelin <923822139@qq.com>
Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
Add LongCat-Flash support.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: chuyuelin <923822139@qq.com>
Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
Add LongCat-Flash support.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: chuyuelin <923822139@qq.com>
Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
Add LongCat-Flash support.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: chuyuelin <923822139@qq.com>
Co-authored-by: chuyuelin <chuyuelin1@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ops module:quantization ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants