[Bugfix] Fix DeepSeek V2-Lite Accuracy drop#40673
[Bugfix] Fix DeepSeek V2-Lite Accuracy drop#40673robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the _fused_output_is_reduced property with an instance attribute initialized in __init__ and _replace_quant_method. A critical issue was identified regarding the timing of this initialization: since the moe_kernel is typically lazily initialized during the first forward pass, caching the value in __init__ will likely result in a stale False value. This could lead to correctness issues, such as the double reduction of fused outputs, because the attribute will not reflect the kernel's state once it is actually loaded.
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
| This is the "early" all-reduce path. When the combine kernel produces | ||
| already-reduced fused output, shared output must be reduced separately | ||
| to match. | ||
| If sequence parallelism is not enabled and the and the combine kernel |
There was a problem hiding this comment.
We should move the SP reductiuon into the runner soon
|
Hi @bnellnm, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
4a6dd1c
into
vllm-project:main
Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Adrian <info@zzit.ch>
Purpose
Fix DeepSeek V2-Lite Accuracy drop introduced by #40560
Test Plan
Run
bash .buildkite/scripts/scheduled_integration_test/deepseek_v2_lite_ep_eplb.sh 0.25 200 8010Ran + verified all models from #39956
CI MoE Refactor tests
Test Result
Accuracy went from 0.02 -> 0.35
cc @robertgshaw2-redhat
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.