Welcome

This is my personal practice of implementing various algorithms of RL from scratch.

Most of them will be in jupyter notebook and some of them involving multiprocess would be in normal python files.

The framework will always be PyTorch, as a personal practice too.

Normally I use cartpole for easy algorithms in this project and I skip the visual input part. (which is quite trivial if you add few conv layers).

And for harder and visual-related algorithms I will pick various atari game as my environment.

Due to time limit, I will not provide systematic analysis to any particular algorithm. And be aware these are personal usage so bugs do appear frequently.

If the project is mature, I will accept open issues. For now, however, let me dive in. (I guess no one even read this repo though)

Project file structure will be changed continuously to match my needs.

PLAN:

Model-Free RL

Policy Gradient

Deep Q Learning

Dueling DDQN
Dueling DDQN + PER
Rainbow DQN
Ape-X

Distributed RL

C51
QR-DQN
IQN
Dopamine (DQN + C51 + IQN + Rainbow)

Policy Gradient with Action-Dependent Baselines:

Q-prop
Stein Control Variates

Path-Consistency Learning

PCL
Trust-PCL

Q-learning + Policy Gradient:

PGQL
Reactor
IPG

Evolutionary Algorithm

Monte Carlo Tree (Alpha Zero)

Exploration RL

Intrinsic Motivation

Unsupervised RL

VIC
DIAYN
VALOR

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.idea		.idea
A2C		A2C
A3C		A3C
Ape-X		Ape-X
D4PG		D4PG
DDPG		DDPG
Deuling Double DQN with PER		Deuling Double DQN with PER
Deuling Double DQN		Deuling Double DQN
Experiments		Experiments
MASM		MASM
Off-Policy Policy Gradient		Off-Policy Policy Gradient
Plain-Actor-Critic		Plain-Actor-Critic
RAINBOW		RAINBOW
REINFORCE		REINFORCE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome

PLAN:

Model-Free RL

Policy Gradient

Deep Q Learning

Distributed RL

Policy Gradient with Action-Dependent Baselines:

Path-Consistency Learning

Q-learning + Policy Gradient:

Evolutionary Algorithm

Monte Carlo Tree (Alpha Zero)

Exploration RL

Intrinsic Motivation

Unsupervised RL

Hierachy RL

Memory RL

Model-Based RL

Meta-RL

Scaling-RL

About

Releases

Packages

Languages

LyWangPX/Reinforcement_Learning_Coding_Examples

Folders and files

Latest commit

History

Repository files navigation

Welcome

PLAN:

Model-Free RL

Policy Gradient

Deep Q Learning

Distributed RL

Policy Gradient with Action-Dependent Baselines:

Path-Consistency Learning

Q-learning + Policy Gradient:

Evolutionary Algorithm

Monte Carlo Tree (Alpha Zero)

Exploration RL

Intrinsic Motivation

Unsupervised RL

Hierachy RL

Memory RL

Model-Based RL

Meta-RL

Scaling-RL

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages