11/10/20 - Initial commit of v.2.0 code. Full pipeline of algorithm is now inside the parent class :( Though it makes a bit clearer what is happening, code is a bit more cumbersome; mostly technical parts of algorithms are put inside the library files. There is a small-small-small hope that this approach will finally be flexible enough! The first goal will be to finally beat some basic benchmarks, including continuous control tasks: tested PPO, DDPG, TD3 and SAC on Pendulum, latter three even work.
Also pushed initial commit for theory book (rus): intro chapters, evolutionary algorithms, dyn. programming basics, TD-learning basics, DQN, DQN+modifications. Hoping to keep up with MSU RL course...
18/10/20 - Theory book: added Distributional RL, ch. 4.3 (seems like now the hardest part is done =D). Initiated Demo Project 1: Mario. Let's try again, now with fixed oracle (no negative penalty for moving left) and bugfixed reward function. More details here.
26/10/20 - Theory book: added Policy Gradient chapter 5.1, 5.2. Long pit in Mario still indefeated, discount factor = 0.99 seems to be unappropriate for fast training :(
12/11/20 - Theory book: added TD-lambda, GAE, TRPO and PPO ch. 3.5 and 5.3. Second project on continuous control: TD3 works on PyBullet Ant and Bipedal Walker from gym. SAC fails on Humanoid, probably there are more tricks required.
04/12/20 - Chapters 6 (DDPG, TD3, SAC) and 7 (Bandits, MCTS, LQR) for theory book, nearly finished (ch.8 on advanced topics can be optional). Prioritization experiment failed; gradient explosion. Having... philosophy issues.