RL_lspi_svd

RL SVD feature policy iteration

For 50 iterations or until A_next is identical to A

1)Sample according to policy Yana

K=number of fetures

weights_list=np.zeros(n_actions,K)

2.2) Calculte low rank approximation

2.3) Calculate by Woodbery Identity invers e for calculation of weights

Lera

weights_list[action,:]=weight

agent.set_weights(weights_list)

We should use flattened env.rander('rdg_array') saved as np.array as observation

Policy is agent.predict(obs) Back to step1

Experiments with Bellman error -> Vlad

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Bellman_Error.ipynb		Bellman_Error.ipynb
LSPI_gym_interface_A_final.ipynb		LSPI_gym_interface_A_final.ipynb
README.md		README.md
Sample_obs.py		Sample_obs.py
Woodbury_weights_calculate.py		Woodbury_weights_calculate.py
agent_lspi.py		agent_lspi.py
experiments_desc.txt		experiments_desc.txt
lra.py		lra.py
sample_agent.py		sample_agent.py

Provide feedback