The main purpose of this project is to provide the python implementation of WIS-LSTD introduced by Mahmood, van Hasselt and Sutton (2014). Additionally, it also provides a random walk experiment to illustrate the usage of this algorithm.
It can be imported as an Eclipse Pydev project.
Read or execute runwislstdexperiments.sh
for an example of running the experiment.
##References
Mahmood, A.R., van Hasselt, H., Sutton, R.S. (2014). Weighted importance sampling for off-policy learning with linear function approximation. Advances in Neural Information Processing Systems 27.