Solution for Lunar Lander environment v2 of Open AI gym. The algorithm used is actor-critic (vanilla policy gradient with baseline),
more info : http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_5_actor_critic_pdf.pdf
-> Dependencies:
OpenAI gym
PyTorch 0.4.1
PIL
-> Hyperparameters can be changed by editing them in respective files
-> To train : run train.py
-> Converges within 1500 episodes
-> To test a pretrained model : run test.py