Commit e3c7f71
authored
[Perf] Refactor tensor disposal logic to reduce memory usage (#966)
### What this PR does / why we need it?
1. In previous PRs #580
#784, I saved GPU memory
by promptly deleting unnecessary tensors. For tensors passed from
upper-layer functions, I used a list container to transfer the parameter
and then popped the tensor from the list within the inner function to
achieve deletion. Recently, I discovered a better implementation in
sglang—the `dispose_tensor` function and I recommend adopting this
approach.
2. Dispose `hidden_states` and `residual` from the previous layer once
they're no longer used.
3. Avoid to generate `self.inputs_embeds` in `ModelRunnerV1` in
non-multimodal scenarios.
With the aforementioned optimizations, using the DeepSeek-R1-W8A8 model
under the conditions of `TP=16` and `max-model-len=32768`, we can save
1.3GB of npu memory.
**Reference**: sgl-project/sglang#6147
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
---------
Signed-off-by: ApsarasX <[email protected]>1 parent 6eddbd2 commit e3c7f71
File tree
4 files changed
+30
-23
lines changed- vllm_ascend
- models
- quantization
- worker
4 files changed
+30
-23
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
518 | 519 | | |
519 | 520 | | |
520 | 521 | | |
| 522 | + | |
521 | 523 | | |
522 | 524 | | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
523 | 530 | | |
524 | 531 | | |
525 | 532 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | | - | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
44 | | - | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
63 | | - | |
64 | | - | |
65 | 64 | | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
| |||
155 | 158 | | |
156 | 159 | | |
157 | 160 | | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
| 161 | + | |
| 162 | + | |
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
| |||
281 | 281 | | |
282 | 282 | | |
283 | 283 | | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
| 284 | + | |
| 285 | + | |
288 | 286 | | |
289 | 287 | | |
290 | 288 | | |
| |||
399 | 397 | | |
400 | 398 | | |
401 | 399 | | |
402 | | - | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
| 400 | + | |
| 401 | + | |
407 | 402 | | |
408 | 403 | | |
409 | 404 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
247 | 248 | | |
248 | 249 | | |
249 | 250 | | |
| |||
0 commit comments