Skip to content

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

License

Notifications You must be signed in to change notification settings

ctallec/pytorch-a3c

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pytorch-a3c

This is a PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

This implementation is inspired by Universe Starter Agent. In contrast to the starter agent, it uses an optimizer with shared statistics as in the original paper.

Contibutions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

OMP_NUM_THREADS=1 python main.py --env-name "PongDeterministic-v3" --num-processes 16

This code runs evaluation in a separate thread in addition to 16 processes.

Note: Install most recent nightly build (version '0.1.10+2fd4d08' or later) of PyTorch via this command to prevent memory leaks: pip install git+https://github.com/pytorch/pytorch

Results

With 16 processes it converges for PongDeterministic-v3 in 15 minutes. PongDeterministic-v3

For BreakoutDeterministic-v3 it takes more than several hours.

About

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%