This is the official repository of VAST which will provide code, model checkpoint and dataset. They will be released after paper is accepted.
If you find this code useful for your research, please consider citing:
@article{chen2023vast,
title={VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset},
author={Chen, Sihan and Li, Handong and Wang, Qunbo and Zhao, Zijia and Sun, Mingzhen and Zhu, Xinxin and Liu, Jing},
journal={arXiv preprint arXiv:2305.18500},
year={2023}
}