Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training and Evaluation Code for ViClip #131

Open
fmthoker opened this issue May 30, 2024 · 11 comments
Open

Training and Evaluation Code for ViClip #131

fmthoker opened this issue May 30, 2024 · 11 comments

Comments

@fmthoker
Copy link

Dear authors,
Great work and thanks for releasing the code for ViClip pretraining on InternVid-10M-FLT. Firstly, It would be really great if the pre-trainning instructions are more detailed, like which clip models to start from, paths for config etc.
Secondlly, can you please also release the evaluation code and scripts for evaluating pretrained ViCLIP models for zero shot kinetics-400, ssv2, ucf etc. I want to reproduce the number for zero-shot evaluation in my local setup.

Thanks and Regards

@fmthoker fmthoker changed the title Evaluation Code for ViClip Training and Evaluation Code for ViClip May 30, 2024
@Andy1621
Copy link
Collaborator

Hi! For the zero-shot evaluation, you can refer to the VideoCLIP in InternVideo2.

@fmthoker
Copy link
Author

fmthoker commented Jun 5, 2024

@Andy1621 Thanks for the quick response, are you referring to the scripts in InternVideo/InternVideo2/multi_modality/scripts/evaluation/clip/zero_shot, if so, it seems they are for evaluating InternVideo2 clip. Would the scripts and code work off-the-shelf for not ViClip models that you have shared? Do we need to make any changes? It would also be great if you can share the eval code for ViClip directly.
Thanks in advance.

@Andy1621
Copy link
Collaborator

Andy1621 commented Jun 5, 2024

Hi~ You can find the evaluation sctipets here

@fmthoker
Copy link
Author

fmthoker commented Jun 5, 2024

@Andy1621 Thanks for you quick response, will try that to reproduce the results.

@fmthoker
Copy link
Author

fmthoker commented Jun 5, 2024

@Andy1621 I tried to do zero-shot eval on msrvtt-1k with scrpts from here
However, I am getting the following errors
File "tasks/retrieval.py", line 15, in
Traceback (most recent call last):
File "tasks/retrieval.py", line 15, in
from models.vindlu import VindLU
ModuleNotFoundError: No module named 'models.vindlu'
from models.vindlu import VindLU
ModuleNotFoundError: No module named 'models.vindlu'

@Andy1621
Copy link
Collaborator

Andy1621 commented Jun 6, 2024

I think it's a bug when cleaning the code, you can fix it in tasks/retrieval.py by

# from models.vindlu import VindLU
# from models.vindlu_vit import VindLU_VIT
# from models.vindlu_videoclip import VindLU_VideoCLIP
# from models.vindlu_blip_qformer import VindLU_BLIP_QFormer
from models.viclip import ViCLIP

And also change the model in config.py form VindLU_VideoCLIP to ViCLIP.

@fmthoker
Copy link
Author

fmthoker commented Jun 6, 2024

@Andy1621 Thanks, it solves the problem, however i think the code is still not complete as i get following error:

Traceback (most recent call last):
File "tasks/retrieval.py", line 292, in
main(cfg)
File "tasks/retrieval.py", line 208, in main
res = evaluation_wrapper(
File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 85, in evaluation_wrapper
i2t_x, t2i_x, i2t_emb, t2i_emb = evaluation(
File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 132, in evaluation
image_feats, pooled_image_feats = extract_vision_feats(
File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 54, in extract_vision_feats
image_feat, pooled_image_feat = model.encode_vision(image, test=True)
ValueError: too many values to unpack (expected 2)

@Code-kunkun
Copy link

@Andy1621 Thanks, it solves the problem, however i think the code is still not complete as i get following error:

Traceback (most recent call last): File "tasks/retrieval.py", line 292, in main(cfg) File "tasks/retrieval.py", line 208, in main res = evaluation_wrapper( File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 85, in evaluation_wrapper i2t_x, t2i_x, i2t_emb, t2i_emb = evaluation( File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 132, in evaluation image_feats, pooled_image_feats = extract_vision_feats( File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 54, in extract_vision_feats image_feat, pooled_image_feat = model.encode_vision(image, test=True) ValueError: too many values to unpack (expected 2)

Did you solve this problem? I got the same error.

@fmthoker
Copy link
Author

@Code-kunkun Yes, you need to change line 79 in tasks/retrieval_utils.py

if config.model.model_cls == "VindLU_VideoCLIP":
to if config.model.model_cls == "VindLU_VideoCLIP" or config.model.model_cls == "ViCLIP"
Let me know if that works

@Code-kunkun
Copy link

@Code-kunkun Yes, you need to change line 79 in tasks/retrieval_utils.py

if config.model.model_cls == "VindLU_VideoCLIP":

to if config.model.model_cls == "VindLU_VideoCLIP" or config.model.model_cls == "ViCLIP"
Let me know if that works

Thanks for your quick reply! It works🥳.

@fmthoker
Copy link
Author

fmthoker commented Jun 30, 2024

@Andy1621 Thanks for your help so far with the zero-shot evaluation, can you please refer to me which scripts/code to use for full fine-tuning of the ViCLIP models?
Also, how do we run full finetuning for action classification datasets like ssv2, and kinetics with the current codebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants