-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need to crop the HiREST videos? #10
Comments
Hi, thanks for your interest. Yes, for |
Many thanks for your reply! For the VATEX videos from Valley, are the videos cropped according to the filenames as well? |
Also, using the provided checkpoint, the evaluation results on Charades-STA are different from those reported in Table 2 of the paper. Below are my reproduced results.
Is it because the released model differs from the one reported in the paper (it seems that the paper version is trained on TimeIT only, but the released one is also trained on Valley), or the results in the paper were obtained with ASR captions on Charades-STA? If so, would it be possible to share the code for obtaining ASR captions (the file in |
We follow the instructions from Valley (https://github.com/RupertLuo/Valley/blob/main/Crawler/README.md#vatex) to download the VATEX videos and do not conduct cropping. The |
|
Thanks for your detailed reply! But it seems that Valley cropped the VATEX videos according to the filenames (see here)... |
I see. Many thanks for your explanation! Does it mean that TimeChat was trained with ASR, but not using it during evaluation for fair comparison with existing methods? |
Also, some QA pairs are missing in
|
Yes you are right, sorry about that... We have cropped the VATEX videos before training (which was done by my teammate), thus these is no problem with the released ckpt (the video filename in https://huggingface.co/datasets/ShuhuaiRen/TimeIT/blob/main/data/valley/instruct_valley_72k.json is also for the cropped version). We have updated the code for processing the Valley dataset, please refer to https://github.com/RenShuhuai-Andy/TimeChat/blob/master/docs/DATA.md#process-valley. We notice that the Valley dataset has been updated (from 73K to 65K), you can reprocess the instruction json if you want to use the new dataset :) |
Yes |
Yes, we use half of the QA pairs for accelerating training. To use full of QA pairs, you can reprocess the Valley instruction json using https://github.com/RenShuhuai-Andy/TimeChat/blob/master/utils/process_valley.py |
I see... Data preprocessing is always tricky 🤣 Thank you so much! I have a final question regarding the batch size during instruction tuning and fine-tuning (sorry for asking so much...I'm trying my best to understand your method). According to the training config Also, I have tried to find the config (number of GPUs, per device batch size, accumulate iters, and how to set
I was wondering whether you could kindly clarify the settings for fine-tuning. Thank you! |
no. At each epoch, we conduct
Accordingly, the
You can also increase the training epoch for better performance. |
Thank you so much for your detailed reply! |
Hi @RenShuhuai-Andy, thanks for sharing this great work! For some videos in HiREST dataset, the filenames are "xxxx_35_79.mp4". Do we need to crop the original videos according to the timestamps in the filename (e.g., cropping the 35s to 79s in this case)?
The text was updated successfully, but these errors were encountered: