Training and Evaluation Code for ViClip #131

fmthoker · 2024-05-30T06:57:35Z

Dear authors,
Great work and thanks for releasing the code for ViClip pretraining on InternVid-10M-FLT. Firstly, It would be really great if the pre-trainning instructions are more detailed, like which clip models to start from, paths for config etc.
Secondlly, can you please also release the evaluation code and scripts for evaluating pretrained ViCLIP models for zero shot kinetics-400, ssv2, ucf etc. I want to reproduce the number for zero-shot evaluation in my local setup.

Thanks and Regards

Andy1621 · 2024-05-31T01:56:34Z

Hi! For the zero-shot evaluation, you can refer to the VideoCLIP in InternVideo2.

fmthoker · 2024-06-05T07:25:44Z

@Andy1621 Thanks for the quick response, are you referring to the scripts in InternVideo/InternVideo2/multi_modality/scripts/evaluation/clip/zero_shot, if so, it seems they are for evaluating InternVideo2 clip. Would the scripts and code work off-the-shelf for not ViClip models that you have shared? Do we need to make any changes? It would also be great if you can share the eval code for ViClip directly.
Thanks in advance.

Andy1621 · 2024-06-05T07:56:51Z

Hi~ You can find the evaluation sctipets here

fmthoker · 2024-06-05T08:09:00Z

@Andy1621 Thanks for you quick response, will try that to reproduce the results.

fmthoker · 2024-06-05T19:50:13Z

@Andy1621 I tried to do zero-shot eval on msrvtt-1k with scrpts from here
However, I am getting the following errors
File "tasks/retrieval.py", line 15, in
Traceback (most recent call last):
File "tasks/retrieval.py", line 15, in
from models.vindlu import VindLU
ModuleNotFoundError: No module named 'models.vindlu'
from models.vindlu import VindLU
ModuleNotFoundError: No module named 'models.vindlu'

Andy1621 · 2024-06-06T02:11:59Z

I think it's a bug when cleaning the code, you can fix it in tasks/retrieval.py by

# from models.vindlu import VindLU
# from models.vindlu_vit import VindLU_VIT
# from models.vindlu_videoclip import VindLU_VideoCLIP
# from models.vindlu_blip_qformer import VindLU_BLIP_QFormer
from models.viclip import ViCLIP

And also change the model in config.py form VindLU_VideoCLIP to ViCLIP.

fmthoker · 2024-06-06T06:13:11Z

@Andy1621 Thanks, it solves the problem, however i think the code is still not complete as i get following error:

Traceback (most recent call last):
File "tasks/retrieval.py", line 292, in
main(cfg)
File "tasks/retrieval.py", line 208, in main
res = evaluation_wrapper(
File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 85, in evaluation_wrapper
i2t_x, t2i_x, i2t_emb, t2i_emb = evaluation(
File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 132, in evaluation
image_feats, pooled_image_feats = extract_vision_feats(
File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 54, in extract_vision_feats
image_feat, pooled_image_feat = model.encode_vision(image, test=True)
ValueError: too many values to unpack (expected 2)

Code-kunkun · 2024-06-23T12:48:39Z

@Andy1621 Thanks, it solves the problem, however i think the code is still not complete as i get following error:

Traceback (most recent call last): File "tasks/retrieval.py", line 292, in main(cfg) File "tasks/retrieval.py", line 208, in main res = evaluation_wrapper( File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 85, in evaluation_wrapper i2t_x, t2i_x, i2t_emb, t2i_emb = evaluation( File "/ibex/ai/home/thokerfm/anaconda3/envs/viclip/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 132, in evaluation image_feats, pooled_image_feats = extract_vision_feats( File "/home/thokerfm/InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py", line 54, in extract_vision_feats image_feat, pooled_image_feat = model.encode_vision(image, test=True) ValueError: too many values to unpack (expected 2)

Did you solve this problem? I got the same error.

fmthoker · 2024-06-23T13:08:54Z

@Code-kunkun Yes, you need to change line 79 in tasks/retrieval_utils.py

InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py

Line 79 in 1018382

if config.model.model_cls == "VindLU_VideoCLIP":

to if config.model.model_cls == "VindLU_VideoCLIP" or config.model.model_cls == "ViCLIP"
Let me know if that works

Code-kunkun · 2024-06-23T13:13:39Z

@Code-kunkun Yes, you need to change line 79 in tasks/retrieval_utils.py

InternVideo/InternVideo1/Pretrain/ViCLIP/tasks/retrieval_utils.py

Line 79 in 1018382

if config.model.model_cls == "VindLU_VideoCLIP":

to if config.model.model_cls == "VindLU_VideoCLIP" or config.model.model_cls == "ViCLIP"
Let me know if that works

Thanks for your quick reply! It works🥳.

fmthoker · 2024-06-30T07:27:16Z

@Andy1621 Thanks for your help so far with the zero-shot evaluation, can you please refer to me which scripts/code to use for full fine-tuning of the ViCLIP models?
Also, how do we run full finetuning for action classification datasets like ssv2, and kinetics with the current codebase?

fmthoker changed the title ~~Evaluation Code for ViClip~~ Training and Evaluation Code for ViClip May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training and Evaluation Code for ViClip #131

Training and Evaluation Code for ViClip #131

fmthoker commented May 30, 2024

Andy1621 commented May 31, 2024

fmthoker commented Jun 5, 2024 •

edited

Loading

Andy1621 commented Jun 5, 2024

fmthoker commented Jun 5, 2024

fmthoker commented Jun 5, 2024 •

edited

Loading

Andy1621 commented Jun 6, 2024

fmthoker commented Jun 6, 2024

Code-kunkun commented Jun 23, 2024

fmthoker commented Jun 23, 2024

Code-kunkun commented Jun 23, 2024

fmthoker commented Jun 30, 2024 •

edited

Loading

Training and Evaluation Code for ViClip #131

Training and Evaluation Code for ViClip #131

Comments

fmthoker commented May 30, 2024

Andy1621 commented May 31, 2024

fmthoker commented Jun 5, 2024 • edited Loading

Andy1621 commented Jun 5, 2024

fmthoker commented Jun 5, 2024

fmthoker commented Jun 5, 2024 • edited Loading

Andy1621 commented Jun 6, 2024

fmthoker commented Jun 6, 2024

Code-kunkun commented Jun 23, 2024

fmthoker commented Jun 23, 2024

Code-kunkun commented Jun 23, 2024

fmthoker commented Jun 30, 2024 • edited Loading

fmthoker commented Jun 5, 2024 •

edited

Loading

fmthoker commented Jun 5, 2024 •

edited

Loading

fmthoker commented Jun 30, 2024 •

edited

Loading