Skip to content

Maghoumi/DeepNAG

Repository files navigation

DeepNAG: Deep Non-Adversarial Gesture Generation

Official PyTorch implementation of DeepNAG.

DeepNAG is a novel RNN-based sequence generator that can reliably create synthetic 2D/3D gestures. Instead of relying on generative adversarial networks (GAN) to train a sequence generator, DeepNAG uses a standalone loss function based on soft dynamic time warping (sDTW) and Hausdorff distance. Our novel loss function is intuitive, runs fast and yields great results. Please see our publication for more details.

DeepNAG's architecture is fairly simple, and only consists of gated recurrent units (GRU) and fully-connected layers:

What Does this Repository Contain?

This repository contains:

  1. DeepNAG's implementation
  2. DeepGAN's implementation (GAN-based gesture generation)
  3. Pretrained DeepNAG and DeepGAN models (found under pretrained_models)

Sample Generation Results

Kinect Gestures

The following is some sample synthetic gestures from the JK2017 (Kinect) dataset. In both animations, the black skeleton is an actual person while the remaining skeletons are synthetic results produced by DeepNAG!

Pen Gestures

The following is some sample synthetic gestures from the $1-GDS dataset (produced by DeepNAG). The red samples are human drawn, while the black samples are synthetic. The last two rows are the overlayed renderings of some randomly selected real and synthetic samples to demonstrate the diversity of the generated samples compared to the real ones.

Getting Started

Prerequisites

The following is the list of requirements for this project:

  • Python v3.6+
  • PyTorch v1.4+
  • Numpy (will be installed along PyTorch)
  • Matplotlib (for visualization)
  • pytorch-softdtw-cuda (included as a git submodule)
  • JitGRU (included as a git submodule) (needed for DeepGAN only)
  • Numba (preferably install via your OS's package manager)
  • (Optional) Tensorboard to monitor the training process
  • (Optional) CUDA toolkit with an NVIDIA GPU (for faster training, although CPU-only training still works, but is very slow)

All the requirements are included in requirements.txt.

Training a Model From Scratch:

Training a model from scratch involves 3 easy steps:

  1. Obtain the sources and initialize git submodules
git clone https://github.com/Maghoumi/DeepNAG.git
cd DeepNAG
git submodule update --init --recursive
  1. Install the dependencies (make sure the correct pip is used)
pip install -r requirements.txt
  1. Run the code (make sure the correct python v3.6+ is used)
python main.py --model=DeepNAG --dataset=dollar-gds --use-tensorboard=1

The above code will download the $1-GDS dataset and train a DeepNAG generator on the entire dataset. The dataset will be downloaded to DeepNAG/data and the run results will be dumped under DeepNAG/logs/unique-session-name. The training progress will be showed in the standard output. The training progress will additionally be written to tensorboard event files under DeepNAG/logs/unique-session-name/tensorboard. By default, training will run for 25000 epochs, which is enough to get good results on the $1-GDS dataset.

The JK2017 (Kinect) dataset is included in this repository. To train a model on this dataset, run

python main.py --model=DeepNAG --dataset=jk2017-kinect

Training will need to run for at least 5000 epochs to produce good results. If you do not pass the --epoch parameter, the code will use default optimal values.

Training a GAN-based Model:

To train a GAN-based model, simply pass --model=DeepGAN as the command line argument to the commands above.

Some Notes on Training DeepGAN
  • Training should run for at least 100000 steps to produce good results. The number of training steps can be set via --epoch.
  • The default learning rate of 1e-4 may not yield great results. Feel free to play with this value to get a well-functioning generator.
  • The training logic for DeepGAN follows that of caogang's repository. This was done to ease the understanding of my code.

Evaluating a Trained Model:

Once training concludes, the trained model will be saved under DeepNAG/logs/unique-session-name/checkpoints. The trained model can be visualized using the argument --evaluate passed to main.py. Some pretrained models are included under DeepNAG/pretrained_models. Run either of the following commands to visualize a trained model's output (needs Matplotlib):

#
# DeepNAG models
#

# Visualize the pretrained DeepNAG model trained on $1-GDS
python main.py --model=DeepNAG --dataset=dollar-gds --evaluate=pretrained_models/DeepNAG/dollar-gds/checkpoint-best.tar

# Visualize the pretrained DeepNAG  model trained on JK2017 (Kinect)
python main.py --model=DeepNAG  --dataset=jk2017-kinect --evaluate=pretrained_models/DeepNAG/jk2017-kinect/checkpoint-best.tar

#
# DeepGAN models
#

# Visualize the pretrained DeepGAN model trained on $1-GDS
python main.py --model=DeepGAN --dataset=dollar-gds --evaluate=pretrained_models/DeepGAN/dollar-gds/checkpoint-best.tar

# Visualize the pretrained DeepGAN  model trained on JK2017 (Kinect)
python main.py --model=DeepGAN  --dataset=jk2017-kinect --evaluate=pretrained_models/DeepGAN/jk2017-kinect/checkpoint-best.tar

Additional Open Source Goodies

Our requirements for this work yielded several other standalone projects which we have also made public.

Our soft DTW for PyTorch in CUDA project, is a fast implementation of the sDTW algorithm on which DeepNAG's loss function relies. We additionally implemented second-order differentiable GRU units for PyTorch using TorchScript (JitGRUs).

Support/Citing

If you find our work useful, please consider starring this repository and citing our work:

@phdthesis{maghoumi2020dissertation,
  title={{Deep Recurrent Networks for Gesture Recognition and Synthesis}},
  author={Mehran Maghoumi},
  year={2020},
  school={University of Central Florida Orlando, Florida}
}

@inproceedings{maghoumi2021deepnag,
  title={DeepNAG: Deep Non-Adversarial Gesture Generation},
  author={Maghoumi, Mehran and Taranta, Eugene Matthew and LaViola, Joseph},
  booktitle={26th International Conference on Intelligent User Interfaces},
  pages={213--223},
  year={2021}
}

License

This project is licensed under the MIT License - see the LICENSE file for details. Note that JK2017 dataset IS NOT a part of this project, and has a different license.