For running my code and reproducing the results, the following packages need to be installed first. I have used Python 3.6 for the whole of this project.
- PyTorch
- Caffe
- NumPy
- cv2
- imageio
- scikit-image
1.Install all the packages mentioned in the 'Requirements' section for the smooth running of this project.
2.Download the MSVD dataset to Data/YouTubeClips
3.Change all the path in these python files to point to directories in your workspace
4.Run to extract the RGB features of videos
5.Run to train the model
6.Run to generate the caption of test videos
or you can directly extract features from a video and generate captions using
You can download the MSVD dataset here
You can download the extracted video features at Features_VGG, and
unzip it to "Data/Features_VGG"
The val.json is the grundtruth of test dataset, the result.json is the generate result. We use the metrics of WangLei(
Some code copy from vijayvee(