Commit 135fa49
authored
Small speed improvements to --async-offload (#10593)
* ops: dont take an offload stream if you dont need one
* ops: prioritize mem transfer
The async offload streams reason for existence is to transfer from
RAM to GPU. The post processing compute steps are a bonus on the side
stream, but if the compute stream is running a long kernel, it can
stall the side stream, as it wait to type-cast the bias before
transferring the weight. So do a pure xfer of the weight straight up,
then do everything bias, then go back to fix the weight type and do
weight patches.1 parent 44869ff commit 135fa49
1 file changed
+13
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
94 | 95 | | |
95 | 96 | | |
96 | 97 | | |
97 | | - | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
99 | 106 | | |
100 | | - | |
101 | | - | |
| 107 | + | |
102 | 108 | | |
103 | | - | |
| 109 | + | |
104 | 110 | | |
105 | 111 | | |
106 | 112 | | |
107 | 113 | | |
108 | | - | |
109 | | - | |
110 | | - | |
| 114 | + | |
| 115 | + | |
111 | 116 | | |
112 | 117 | | |
113 | 118 | | |
| |||
0 commit comments