Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于视频长度和采样问题 #24

Open
fkcptlst opened this issue Nov 13, 2023 · 11 comments
Open

关于视频长度和采样问题 #24

fkcptlst opened this issue Nov 13, 2023 · 11 comments

Comments

@fkcptlst
Copy link
Contributor

代码中对于每个video固定采样8个frame,fixed_frame_number=8

我想请教一下:

  1. 这样设计的考虑是什么?video-llama似乎没有对视频长度进行限制,video-chatgpt限制视频最长为100帧。
  2. 如果要扩展帧数,能否直接把额外的video token附加在后面?还是说需要重新finetune(使其能够适应不同帧数)?
@RupertLuo
Copy link
Owner

你可以把 frame_mode 改成 fps,然后修改 fps 的值,比如 0.1 就是10秒1帧这么抽取。如果要扩展帧目前只能用这个方法。

@fkcptlst
Copy link
Contributor Author

可以请教一下为什么最初设计时要限制8帧么?是帧数太多性能下降吗?

@RupertLuo
Copy link
Owner

对的,如果视频太长,帧数太多做pooling之后会混淆,我在最新版的论文中用了一个transformer去解决这个问题,但是代码还没有更新上来。

@fkcptlst
Copy link
Contributor Author

感谢解答。能问一下新版代码计划什么时候更新?

@RupertLuo
Copy link
Owner

争取这周吧,最近事情比较多

@fkcptlst
Copy link
Contributor Author

好的谢谢!

@RupertLuo
Copy link
Owner

已更新

@fkcptlst
Copy link
Contributor Author

我大致看了一下更新的代码,按我的理解现在还是只支持8帧输入么?另外能问一下新的模型权重计划什么时候更新吗?谢谢!

@fkcptlst fkcptlst reopened this Nov 15, 2023
@RupertLuo
Copy link
Owner

把 frame mode 改成 fps, 传进去的视频 就不是8帧的

@RupertLuo
Copy link
Owner

我知道了,train外面没有留这个接口

@RupertLuo
Copy link
Owner

dataset.py 里面的load video 函数传一个参数,frame mode = ‘fps’ , 就可以按照帧率抽取视频帧了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants