Reinforcement learning algorithms.
There are many different variants on the basic ideas of reinforcement learning. I have implemented some of them, with a focus on linear function approximation.
Extending these algorithms (for example, with nonlinear function approximators such as neural nets) is relatively straightforward once you are familiar with the underlying ideas.
To facilitate this, the algorithms listed are written in a straightforward style and thoroughly commented, with references to the relevant papers and some explanation of the reasoning behind the code.
- TD(λ): Temporal Difference Learning
- LSTD(λ): Least-Squares Temporal Difference Learning
- ETD(λ): Emphatic Temporal Difference Learning
- GTD(λ): Gradient Temporal Difference Learning, AKA TDC(λ)
- TOTD(λ): True-Online Temporal Difference Learning, AKA TD with "Dutch Traces"
- ESTD(λ): Least Squares Emphatic Temporal Difference Learning
- HTD(λ): Hybrid Temporal Difference Learning
- DVTD(λ) or TD-δ^2: Online Variance Estimation via temporal difference errors
- Q-Learning
- SARSA
- Distributional RL algorithms
- Other second-order TD algorithms (e.g., NTD)
- Actor-Critic algorithms
Send me a pull request if you have code to contribute.
Alternatively, raise an issue and provide me with a link to the paper describing the algorithm, and I will read and implement it when I get a chance.