-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Hi, thank you for your great work. I've seen all lines of your MetNet 3's implementation. Reviewing from your codes is joyful, and i have learned so much.
However, when I was checking the MaxViT module, and read the whole original paper, I had a question about the Block/Grid Attention design.
In appendix, formula (4) and (5), the final shape of the block and grid attention are the same: (HW/(P*P), P*P, C)
(since P = G), only the grid did a swapaxes
, but i dont think that's a big deal.
And in your MaxViT's implementation
from line 259 to line 267, these 2 attentions implementations are the same codes.
Also, I've checked offical implementation, nothing special.
So, from the paper's shape description and implementations, there's no sign of local attention and global attention
, I'd love to call these as 2 local attentions stack
.
I'm a new learner to DL, above all is my humble thoughts. Please leave comments if you have any ides.Thank you for your time.