Preprint: VideoLLM: Modeling Video Sequence with Large Language Models
With the exponential growth of video data, there is an urgent need for automated technology to analyze and comprehend video content. However, existing video understanding models are often task-specific and lack a comprehensive capability of handling diverse tasks.
The success of large language models (LLMs) like GPT has demonstrated their impressive abilities in sequence causal reasoning. Building upon this insight, we propose a novel framework called
Computational resources are constrained due to other work in progress. And I'm too busy😭! So open source still takes time....
- Release code and models
This project is released under the Apache 2.0 license.
If you find this project useful in your research, please consider cite:
@misc{2023videollm,
title={VideoLLM: Modeling Video Sequence with Large Language Models},
author={Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei Huang, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu and Limin Wang},
howpublished = {\url{https://arxiv.org/abs/2305.13292)},
year={2023}
}