[Misc] Refactor aclgraph accuracy test to use logprob-based comparison#7455
[Misc] Refactor aclgraph accuracy test to use logprob-based comparison#7455wangxiyuan merged 6 commits intovllm-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request refactors the aclgraph accuracy test to use a logprob-based comparison instead of text-match assertions. The changes involve removing golden answers from test cases and introducing a compare_logprobs function to verify accuracy by comparing log probabilities against an eager-mode baseline. Additionally, the PR adds @wait_until_npu_memory_free(0.7) decorator to ensure sufficient NPU memory before running tests. The PR also removes the gen_and_valid function and replaces it with compare_logprobs calls.
Signed-off-by: wangli <wangli858794774@gmail.com>
vllm-project#7455) ### What this PR does / why we need it? Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
vllm-project#7455) ### What this PR does / why we need it? Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit
vllm-project#7455) ### What this PR does / why we need it? Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit pick use
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit pick use 123
What this PR does / why we need it?
Replace text-match assertions with a two-tier logprob accuracy check:
atol.decode_atol(defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence.Does this PR introduce any user-facing change?
How was this patch tested?