About activitynet captions dataset in CLIP-ViP #41

musicman217 · 2024-06-30T04:23:39Z

hello, thank you for sharing your excellent work!
I have reproduced result in msrvtt and even acquire a higher result than that in paper.

But when I tried to reproduce on activitynet captions, I found that in actnet_retrieval_vip_base_32.jsonthe vision format setting is frame instead of video, and I tried to reproduce on vision format video with sampling 32 frames setting it almost reach to r@1=20 finally.
Then I use opencv library to extract frames but it still can’t reach the result in paper.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About activitynet captions dataset in CLIP-ViP #41

About activitynet captions dataset in CLIP-ViP #41

musicman217 commented Jun 30, 2024 •

edited

Loading

About activitynet captions dataset in CLIP-ViP #41

About activitynet captions dataset in CLIP-ViP #41

Comments

musicman217 commented Jun 30, 2024 • edited Loading

musicman217 commented Jun 30, 2024 •

edited

Loading