Skip to content

My Implementation of the Accelerated Gradient Temporal Difference Learning algorithm in Python

License

Notifications You must be signed in to change notification settings

VEXLife/Accelerated-TD

Repository files navigation

Accelerated-TD

My implementation of the Accelerated Gradient Temporal Difference Learning algorithm (ATD) in Python.

GitHub code size in bytes GitHub GitHub issues Gitee issues GitHub Pull requests Contribution GitHub Repo stars GitHub forks Gitee Repo stars Gitee forks

Introduction

Agents

PlainATDAgent updates directly while SVDATDAgent and DiagonalizedSVDATDAgent update its singular value decompositions respectively which is thought to have a fewer complexity. The difference between SVDATDAgent and DiagonalizedSVDATDAgent is that SVDATDAgent employs the method mentioned here: Brand 2006, while DiagonalizedSVDATDAgent adopted the method mentioned here: Gahring 2015 which diagonalizes so that the pseudo-inverse of the matrix is more easy to calculate though I still can't figure out completely how it works.

I also implemented a conventional Gradient Temporal Difference agent called TDAgent. I tested them in several environments as introduced below.

Backend Support

I provided the backend support for PyTorch(CPU) to skip the process converting from numpy.ndarray to torch.Tensor and vice versa. You can achieve this by adding this code before importing atd module:

import os
os.environ["ATD_BACKEND"] = "NumPy"  # or "PyTorch"

To test it yourself, just clone the repository and run python algorithm_test/<random_walk or boyans_chain>.py. :)

Requirements

  • Python>=3.9
  • NumPy>=1.19
  • Torch>=1.10 if you want to use PyTorch as backend
  • Matplotlib>=3.3.3 if you want to run my test script
  • Tqdm if you want to run my test script

Tests

Random Walk

This environment is from Sutton's book.

The code file is this and the result is here: random_walk

Boyan's Chain

The environment was proposed in Boyan 1999.

The code file is this and the result is here: boyans_chain

Usage

To import my implementation of the algorithm into your project, follow these instructions if you aren't very familiar with this.

  1. Clone the repository and copy the atd.py to where you want. If you downloaded a .zip file from GitHub, remember to unzip it.
  2. Add this code to your Python script's head:
    from atd import TDAgent, SVDATDAgent, DiagonalizedSVDATDAgent, PlainATDAgent  # or any agent you want
  3. If the destination directory is not the same as where your main Python file is, you should use this code snippet instead of Step 2 to append the directory to the environment variable so that the Python interpreter could find it. Alternatively, you can refer to importlib provided by later Python.
    import sys
    
    sys.path.append("<The directory where you placed atd.py>")
    from atd import TDAgent, SVDATDAgent, DiagonalizedSVDATDAgent, PlainATDAgent  # or any agent you want
  4. Initialize an agent like this and you are ready to use it!
    agent = TDAgent(lr=0.01, lambd=0, observation_space_n=4, action_space_n=2)

Reference: Gahring 2016

Please feel free to make a Pull Request and I'm expecting your Issues.