Commit 7e997a4
committed
Refactor attention v2
Pull Request resolved: #10623
Pull attention creation out of Transformer/TransformerBlock. Instead, pass the layers into Transformer.
The motivation is to customize linear layers in attention for LoRA (eg. make wq into a LoraLinear instead of a regular linear). In the next diff (D73517350), we pull wq,wk,wv,wo out of the attention and pass those in as well.
This allows us to customize attention parameters without passing in ModelArgs and doing the customization deep inside attention.py.
I think this modularizes our attention/transformer components, though also means that users have to do some more work to construct the attention layers and pass it to transformer.
It follows the torchtune structure more closely, eg. https://github.com/pytorch/torchtune/blob/main/torchtune/models/llama3_2/_component_builders.py#L221
Previously here: D73474110
ghstack-source-id: 282118266
@exported-using-ghexport
Differential Revision: [D73538697](https://our.internmc.facebook.com/intern/diff/D73538697/)1 parent 4129ebe commit 7e997a4
File tree
5 files changed
+73
-22
lines changed- examples/models
- llama
- tests
- llava
5 files changed
+73
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
83 | 84 | | |
84 | 85 | | |
85 | 86 | | |
86 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
87 | 96 | | |
88 | 97 | | |
89 | 98 | | |
90 | 99 | | |
91 | 100 | | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
| 101 | + | |
99 | 102 | | |
100 | 103 | | |
101 | 104 | | |
102 | 105 | | |
103 | 106 | | |
104 | 107 | | |
105 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
106 | 127 | | |
107 | 128 | | |
108 | 129 | | |
| |||
117 | 138 | | |
118 | 139 | | |
119 | 140 | | |
120 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
121 | 150 | | |
122 | 151 | | |
123 | 152 | | |
| |||
130 | 159 | | |
131 | 160 | | |
132 | 161 | | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
| 162 | + | |
| 163 | + | |
137 | 164 | | |
138 | 165 | | |
139 | 166 | | |
| |||
212 | 239 | | |
213 | 240 | | |
214 | 241 | | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | 18 | | |
| 19 | + | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
174 | 175 | | |
175 | 176 | | |
176 | 177 | | |
177 | | - | |
| 178 | + | |
178 | 179 | | |
179 | 180 | | |
180 | 181 | | |
| |||
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
11 | 14 | | |
12 | 15 | | |
13 | 16 | | |
| |||
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
42 | | - | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | | - | |
| 163 | + | |
164 | 164 | | |
165 | 165 | | |
166 | | - | |
| 166 | + | |
167 | 167 | | |
168 | 168 | | |
169 | 169 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| |||
0 commit comments