Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoMAE ViT-H pre-train does not contain the decoder weights #89

Open
sandstorm12 opened this issue Apr 13, 2023 · 2 comments
Open

VideoMAE ViT-H pre-train does not contain the decoder weights #89

sandstorm12 opened this issue Apr 13, 2023 · 2 comments

Comments

@sandstorm12
Copy link

sandstorm12 commented Apr 13, 2023

Problem

The VideoMAE ViT-H and VideoMAE ViT-S pre-trained kinetics weights seem to have a problem. When loading the weights of other pre-trained models like ViT-L or ViT-B, the state_dict contains the weights for the decoder layers. But this is not true for the ViT-H and ViT-S. As a result, it is not possible to load it into an encoder/decoder setup.

How to reproduce

To reproduce, just download the weights and load the state_dict. Comparing it to the other pre-trained weights you can see the decoder weights are missing.

URL = "https://drive.google.com/file/d/1AJQR1Rsi2N1pDn9tLyJ8DQrUREiBA1bO/view?usp=sharing"
output_name = "checkpoint.pth"
gdown.cached_download(URL, output_name)

state_dict = torch.load(output_name)
print(state_dict["module"])

The state_dict is very large, so I don't include the output here.

@innat
Copy link

innat commented Aug 26, 2023

The link of pretrain VideoMAE ViT-H is wrong sort of.
It has only the encoder part.

@innat
Copy link

innat commented Aug 27, 2023

cc. @yztongzhan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants