VLEP dataset for video and language future event prediction.
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
- VideoLanguageFuturePrediction
- Dataset
- Evaluation and CodaLab Submission
- Related work
- Citation
- Contact
The dataset is released at data, please see data/README.md for details.
We only release ground-truth answers for train and dev splits. To get results on the test split, please submit your results to our CodaLab evaluation server following the instructions here: standalone_eval/README.md.
- TVC (Video+Dialogue Captioning)
- TVR (Video+Dialogue Retrieval)
- TVQA (Localized Video QA)
- TVQA+ (Spatio-Temporal Video QA)
- recurrent-transformer (coherent video paragraph captioning)
If you find this code useful for your research, please cite our paper:
@inproceedings{lei2020vlep,
title={What is More Likely to Happen Next? Video-and-Language Future Event Prediction},
author={Lei, Jie and Yu, Licheng and Berg, Tamara L and Bansal, Mohit},
booktitle={EMNLP},
year={2020}
}
Jie Lei, [email protected]