Finish prioritised experience replay #42

Kaixhin · 2016-06-16T14:49:41Z

Rank-based prioritised experience replay appears to be working, but technically needs some changes. Instead of storing terminal states with a priority of 0, they should not be stored at all. This requires more checks, as the elements in the experience replay memory and the elements in the priority queue will differ.

Secondly, proportional prioritised experience replay still needs to be implemented. See here and here for an implementation of the sum binary tree.

For reference, below are results from a working implementation of rank-based PER on Frostbite:

Damcy · 2016-07-25T08:14:10Z

maybe we can store experience as a tuple like (s_t, a, r, s_t_1, t), terminal state will not be store in experience replay if use this pattern. usually t is 0, and t == 1 would generate tuple (s, a, r, TERMINAL_STATE, 1)

Kaixhin · 2016-08-16T10:45:25Z

Note: It might be worth subclassing the Heap from torchlib for the priority queue.

Kaixhin added the enhancement label Jun 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish prioritised experience replay #42

Finish prioritised experience replay #42

Kaixhin commented Jun 16, 2016 •

edited

Loading

Damcy commented Jul 25, 2016

Kaixhin commented Aug 16, 2016

Finish prioritised experience replay #42

Finish prioritised experience replay #42

Comments

Kaixhin commented Jun 16, 2016 • edited Loading

Damcy commented Jul 25, 2016

Kaixhin commented Aug 16, 2016

Kaixhin commented Jun 16, 2016 •

edited

Loading