[perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874) by ZYang6263 · Pull Request #5 · yydyzr/vllm-ascend

ZYang6263 · 2026-03-05T06:55:52Z

What this PR does / why we need it?

This PR refactors sfa_v1.py to improve code readability and usability, fixes a code bug, and enhances performance through the replacement of certain operators.

changes

improve code readability: Optimizes parts of the code structure in sfa_v1.py, supplementary comments for key code blocks, removes some unused variables, and improves the naming of certain functions and variables.
resolved a duplicated double write to k_cache: Fixed redundant double writes of k_cache in the indexer_select module (in both the forward function and indexer_select_post_process), improving performance to some extent.
replace scatter ops with reshape_and_cache: This optimization replaces two separate cache storage operations on k_nope and k_pe with a single call to the reshape_and_cache operator, improving performance. The original scatter operator involves reordering slot_mapping for generality, introducing significant scalar computations. In contrast, the reshape_and_cache operator eliminates this redundant reordering step, thus reducing unnecessary computation time and enhancing the operator's performance.

performance comparison

4*A3, 1P1D, P dp2tp16, D dp8tp4, input/output: 64K/3K origin:
TTFT: 28s, TPOT: 26ms, TPS: 820 token/s

fixed redundant double writes of k_cache:
TTFT: 24s, TPOT: 26ms, TPS: 840 token/s

replace scatter ops with reshape_and_cache:
TTFT: 24s, TPOT: 26ms, TPS: 850 token/s

Does this PR introduce any user-facing change? No.

How was this patch tested?

CI passed with new added/existing test.

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@15d76f7

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

…m-project#6874) ### What this PR does / why we need it? This PR refactors sfa_v1.py to improve code readability and usability, fixes a code bug, and enhances performance through the replacement of certain operators. ### changes - **improve code readability**: Optimizes parts of the code structure in sfa_v1.py, supplementary comments for key code blocks, removes some unused variables, and improves the naming of certain functions and variables. - **resolved a duplicated double write to k_cache**: Fixed redundant double writes of k_cache in the indexer_select module (in both the `forward` function and `indexer_select_post_process`), improving performance to some extent. - **replace `scatter` ops with `reshape_and_cache`**: This optimization replaces two separate cache storage operations on `k_nope` and `k_pe` with a single call to the `reshape_and_cache` operator, improving performance. The original `scatter` operator involves reordering slot_mapping for generality, introducing significant scalar computations. In contrast, the `reshape_and_cache` operator eliminates this redundant reordering step, thus reducing unnecessary computation time and enhancing the operator's performance. ### performance comparison 4*A3, 1P1D, P dp2tp16, D dp8tp4, input/output: 64K/3K origin: TTFT: **28s**, TPOT: 26ms, TPS: **820 token/s** fixed redundant double writes of k_cache: TTFT: **24s**, TPOT: 26ms, TPS: **840 token/s** replace scatter ops with reshape_and_cache: TTFT: **24s**, TPOT: 26ms, TPS: **850 token/s** ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: rjg-lyh <1318825571@qq.com>

ZYang6263 merged commit 0f1acce into br_glm Mar 5, 2026

yydyzr mentioned this pull request Mar 9, 2026

Revert "[perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874)" #8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874)#5

[perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874)#5
ZYang6263 merged 1 commit intobr_glmfrom
pr_mlapo

ZYang6263 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZYang6263 commented Mar 5, 2026

What this PR does / why we need it?

changes

performance comparison

Does this PR introduce any user-facing change? No.

How was this patch tested?

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants