Commit cf975e2
committed
[V1][TPU] Speed up top-k on TPU by using torch.topk.
Also:
* Added an env VLLM_TPU_DISABLE_TOPK_TOPP_OPTIMIZATION to guard the
optimization logic. This is to be conservative, so that we can turn
off these changes in case they cause troubles.
* Edited tpu/test_sampler.py to include top-k.
Signed-off-by: Hyesoo Yang <[email protected]>1 parent d8e82bc commit cf975e2
File tree
3 files changed
+29
-4
lines changed- tests/v1/tpu
- vllm
- v1/sample/ops
3 files changed
+29
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| 98 | + | |
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
| |||
623 | 624 | | |
624 | 625 | | |
625 | 626 | | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
626 | 632 | | |
627 | 633 | | |
628 | 634 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
70 | 77 | | |
71 | 78 | | |
72 | 79 | | |
| |||
105 | 112 | | |
106 | 113 | | |
107 | 114 | | |
108 | | - | |
109 | | - | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
110 | 128 | | |
111 | 129 | | |
112 | 130 | | |
| |||
0 commit comments