[Bugfix] Fix the input constraints checks for the mlapo and bmm_transpose operators#5764
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes the input constraint checks for the mlapo and bmm_transpose operators by replacing the has_prefill flag with a direct check on the number of input tokens. This change addresses a bug and improves the correctness of operator selection. The associated code simplification by removing the has_prefill logic is also a good improvement. However, this refactoring appears to break existing unit tests, which must be updated to reflect the changes.
ad41eba to
56d8584
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
56d8584 to
c2a4d80
Compare
c2a4d80 to
d2c4609
Compare
fd0113d to
186e916
Compare
…pose operators Signed-off-by: rjg-lyh <1318825571@qq.com>
186e916 to
dbea536
Compare
…bmm_transpose operators (vllm-project#5764) This PR fix the input constraints checks for the mlapo and bmm_transpose operators. No. CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (vllm-project#5764) This PR fix the input constraints checks for the mlapo and bmm_transpose operators. No. CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (vllm-project#5764) This PR fix the input constraints checks for the mlapo and bmm_transpose operators. No. CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (vllm-project#5764) This PR fix the input constraints checks for the mlapo and bmm_transpose operators. No. CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (#5764) (#6088) ### What this PR does / why we need it? This PR cherry-pick #5764 This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>
…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [EPLB] Config Rename wrapper (vllm-project#6111) [v0.13.0][Bugfix] Fix the input constraints checks for the mlapo and bmm_transpose operators (vllm-project#5764) (vllm-project#6088)
…pose operators (vllm-project#5764) ### What this PR does / why we need it? This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 ### Perf 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (vllm-project#5764) (vllm-project#6088) ### What this PR does / why we need it? This PR cherry-pick vllm-project#5764 This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>
…pose operators (vllm-project#5764) ### What this PR does / why we need it? This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 ### Perf 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (vllm-project#5764) (vllm-project#6088) ### What this PR does / why we need it? This PR cherry-pick vllm-project#5764 This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>
…bmm_transpose operators (vllm-project#5764) (vllm-project#6088) ### What this PR does / why we need it? This PR cherry-pick vllm-project#5764 This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>
…pose operators (vllm-project#5764) ### What this PR does / why we need it? This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 ### Perf 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…pose operators (vllm-project#5764) ### What this PR does / why we need it? This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 ### Perf 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
…pose operators (vllm-project#5764) ### What this PR does / why we need it? This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 ### Perf 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…pose operators (vllm-project#5764) ### What this PR does / why we need it? This PR fix the input constraints checks for the mlapo and bmm_transpose operators. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 ### Perf 64K/3K,1P1D,bs=32 before this pr: TPOT 29ms, TTFT 47s,TPS 606 token/s after this pr: TPOT 29ms, TTFT 48s,TPS 636 token/s Signed-off-by: rjg-lyh <1318825571@qq.com>
What this PR does / why we need it?
This PR fix the input constraints checks for the mlapo and bmm_transpose operators.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI passed with new added/existing test.
Perf
64K/3K,1P1D,bs=32
before this pr:
TPOT 29ms, TTFT 47s,TPS 606 token/s
after this pr:
TPOT 29ms, TTFT 48s,TPS 636 token/s