Skip to content

namjiwon1023/Code_With_RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code With Deep Reinforcement Learning

Single Agent Algorithm

Value Based

  • Deep Q Network(DQN) (off-policy)
  • Double Deep Q Network(Double DQN) (off-policy)
  • Dueling Deep Q Network(Dueling DQN) (off-policy)
  • Duelling Double Deep Q Network(D3QN) (off-policy)
  • Noisy Networks for Exploration(NoisyDQN) (off-policy)

Actor-Critic Method

  • Advantage Actor-Critic(A2C) (on-policy)
  • Asynchronous Advantage Actor-Critic(A3C) (on-policy)
  • Proximal Policy Optimization(PPO)(GAE) (on-policy)(Nearing off-policy)
  • Proximal Policy Gradient(PPG) (on-policy PPO + off-policy Critic[Let it share parameters with PPO's Critic])
  • Deep Deterministic Policy Gradient(DDPG) (off-policy)
  • Twin Delayed Deep Deterministic policy gradient(TD3) (off-policy)
  • Soft Actor-Critic(SAC) (off-policy)
  • Truncated Quantile Critics(TQC) (off-policy)
  • Distribution Correction(DisCor) based on Soft Actor-Critic(DisCor)
  • Randomized Ensembled Double Q-Learning(REDQ)

Deep reinforcement learning with a latent variable model

  • Stochastic Latent Actor-Critic(SLAC)
  • SAC with AutoEncoder(SAC_AE)

Regularizing Deep Reinforcement Learning from Pixels

  • Data regularized Q(DrQ-v1)
  • Data regularized Q(DrQ-v2)

Imitation Learning / Inverse Reinforcement Learning

  • Behavior Cloning(BC)
  • Generative Adversarial Imitation Learning(GAIL)

ReplayBuffer Structure

  • Prioritized Experience Replay(PER)
  • Hindsight Experience Replay(HER)

Neural network architecture designed for deep reinforcement learning

  • Deep Dense Architectures in reinforcement learning(D2RL)

Explore

  • Intrinsic Curiosity Module(ICM)

Distributed Reinforcement Learning

  • APEX(resemblance)
  • MPI

Multi Agent Algorithm

Actor-Critic Method

  • Multi Agent Deep Deterministic Policy Gradient(MADDPG)
  • MADDPG Method TD3, SAC
  • Multi Agent Proximal Policy Optimization(MAPPO)
  • COMA

Value Based

  • QMIX

Installation

  • Clone the repo and cd into it:
    git clone https://github.com/namjiwon1023/Code_With_RL
    cd Code_With_RL
  • If you don't have Pytorch installed already, install your favourite flavor of Pytorch. In most cases, you may use
    pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html # pytorch 1.8.1 LTS CUDA 10.2 version. if you have GPU.
    or
    pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html # pytorch 1.8.1 LTS CPU version. if you don`t have GPU.
    to install Pytorch GPU or CPU version.

File Structure

  • Hyperparameter # Algorithm Hyperparameters
    • dqn.yaml
    • doubledqn.yaml
    • duelingdqn.yaml
    • d3qn.yaml
    • noisydqn.yaml
    • ddpg.yaml
    • td3.yaml
    • sac.yaml
    • ppo.yaml
    • a2c.yaml
    • behaviorcloning.yaml
    • etc.
  • agent.py
    • reinforcement learning algorithm
  • network.py
    • QNetwork
    • NoisyLinear
    • ActorNetwork
    • CriticNetwork
  • replaybuffer.py
    • Simple PPO Rollout Buffer
    • Off-Policy Experience Replay
  • runner.py
    • Training loop
    • Evaluator
  • main.py
    • Start training
    • Start evaluation
  • utils.py
    • Make gif image
    • Drawing
    • Basic tools

Quick Start

To train a new network : run python main.py --algorithm=selection algorithm

To test a preTrained network : run python main.py --algorithm=selection algorithm --evaluate=True

Reinforcement learning algorithms that can now be selected:

  • DQN
  • Double_DQN
  • Dueling_DQN
  • D3QN
  • Noisy_DQN
  • DDPG
  • TD3
  • SAC
  • PPO
  • A2C
  • BC_SAC

Discrete action space recommendation: Dueling DoubleQN (D3QN)

Continuous action space recommendation: use TD3 if you are good at tuning parameters, use PPO or SAC if you are not good at tuning parameters, if the training environment Reward function is written by beginners, then use PPO .

Training Environment

Discrete action :

Continuous action :

Multi-Agent Training Environment:

Training Result

Value Based Algorithm Compare Result:



Policy Based Algorithm Compare Result:


Distributed Reinforcement Learning Structure

DRL Structure

Requirements

Python 3.6+ : conda create -n icsl_rl python=3.6
Pytorch 1.6+ : https://pytorch.org/get-started/locally/
Numpy : pip install numpy
openai gym : https://github.com/openai/gym
matplotlib : pip install matplotlib
tensorboard : pip install tensorboard

Citation:

To cite this repository:

@misc{algorithms_drl,
  author = {Zhiyuan Nan},
  title = {Code With Deep Reinforcement Learning},
  year = {2021},
  publisher = {Github},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/namjiwon1023/Code_With_RL}},
}

References

Key Papers in Deep RL

PG Travel Guide

utilForever/rl-paper-study

Khanrc's blog

CUN-bjy/rl-paper-review

Releases

No releases published

Packages

No packages published