You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work.
In the figure, there is a global pooling layer after transformer encoder in ViT backbone. In the original implementation of dino, only cls token is used for successive processing. I wonder if you use global pooling of all tokens instead of cls token in this step.
The text was updated successfully, but these errors were encountered:
Thanks for your great work.
In the figure, there is a global pooling layer after transformer encoder in ViT backbone. In the original implementation of dino, only cls token is used for successive processing. I wonder if you use global pooling of all tokens instead of cls token in this step.
The text was updated successfully, but these errors were encountered: