Commit 2cd036e
authored
[Bugfix] fix accuracy problem for quantized deepseek models (#768)
### What this PR does / why we need it?
The root cause of the bug is that numerical computations involving NaNs
cannot eliminate them. We addressed it by using `masked_fill_` to
eliminate NaNs while avoiding memory-wasting `torch.where` approach.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
This patch was tested with vllm v0.8.5 and vllm-ascend master. I run
deepseek_v3 model with offline inference scripts
(examples/dp_offline/run_dp.sh & data_parallel.py).
Signed-off-by: linfeng-yuan <[email protected]>1 parent d6e9417 commit 2cd036e
1 file changed
+2
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
285 | 285 | | |
286 | 286 | | |
287 | 287 | | |
288 | | - | |
| 288 | + | |
| 289 | + | |
289 | 290 | | |
290 | 291 | | |
291 | 292 | | |
| |||
0 commit comments