This is a torch code for video (action) classification using 3D ResNet trained by this code.
The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.
This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames.
PyTorch (Python) version of this code is available here.
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
luarocks install json
- FFmpeg, FFprobe
wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
- Python 3
- Download this code.
- Download the pretrained model.
- We recommend ResNet-34.
th main.lua --input ./input --output ./output.json --model ./resnet-34-kinetics.t7
To visualize the classification results, use generate_result_video/generate_result_video.py
.
If you use this code, please cite the following:
@article{hara3dresnets
author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh}
title={Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition}
journal={arXiv preprint}
volume={arXiv:1708.07632}
year={2017}
}