Exploration with pseudo counts #34

lake4790k · 2016-06-08T19:52:59Z

New paper with method that performs well on Montezuma's revenge. Implementation could be used with both DDQN ER and async A3C. The probability used for the pseudo count is computed using Context Tree Switching that could be implemented based on this implementation.

lake4790k · 2016-06-16T14:34:19Z

My first step is to implement a CTS based probability measure for small bitmaps (with 1 bit pixels) with the location dependent model described in the paper. I will expect reasonable probabilities calculated for patterns that have been processed (1), similar to those (>0.5) and dissimilar (0).

Kaixhin · 2016-06-16T16:05:43Z

Good luck! Finally got round to reading the paper and noticed some extras in the appendix. Seems like for completeness we'll need to add a stochastic ALE setting for this paper and the PAL paper, plus remove the terminal signal on life loss for this paper. Looks like that can make a huge difference on the results reported.

Kaixhin · 2016-06-16T16:11:10Z

FYI there's another (new) paper from DeepMind with similar goals...

lake4790k · 2016-06-17T15:12:48Z

The paper refers to a number of other papers with regards to CTS usage saying "similar to this and that", but in the end the referred papers do quite different things, best to look at just the method in the pseudo count paper. They also refer to the Skipping CTS paper, but always talk about CTS, so I use the plain CTS for now.

Managed to adapt the CTS code to give reasonable probs for 1-bit pixel bitmaps with the neighbour factors in the paper. It's not described exactly how they handle the multiple bits of a single pixel, that could be done in a number of ways (for a single bit look at the same bit in the neighbouring pixels or look at all bits in the neighbouring pixels). I'll add different options for that and provide a native lib and an ffi interface that could be invoked in ER and async to compute the pseudo counts from the probabilities.

lake4790k · 2016-06-18T14:09:31Z

Kind of finished a separate module with the native probability tree for 8 bit screens. Was not easy, but probably now comes the difficult part... for example the probablity of the screen is the product of the probability of the pixels. Different implementations (CTW and CTS) compute slightly different probabilities, but when there are 42 * 42 * 8 factors the probablity product can be quite different (ie. 0.99 vs 0.99999 ^ (42 * 42 * 8)...) Probably one would need to do exactly as DM to make it work... let's try anyway.

lake4790k added the enhancement label Jun 8, 2016

lake4790k self-assigned this Jun 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploration with pseudo counts #34

Exploration with pseudo counts #34

lake4790k commented Jun 8, 2016 •

edited

Loading

lake4790k commented Jun 16, 2016

Kaixhin commented Jun 16, 2016

Kaixhin commented Jun 16, 2016

lake4790k commented Jun 17, 2016 •

edited

Loading

lake4790k commented Jun 18, 2016 •

edited

Loading

Exploration with pseudo counts #34

Exploration with pseudo counts #34

Comments

lake4790k commented Jun 8, 2016 • edited Loading

lake4790k commented Jun 16, 2016

Kaixhin commented Jun 16, 2016

Kaixhin commented Jun 16, 2016

lake4790k commented Jun 17, 2016 • edited Loading

lake4790k commented Jun 18, 2016 • edited Loading

lake4790k commented Jun 8, 2016 •

edited

Loading

lake4790k commented Jun 17, 2016 •

edited

Loading

lake4790k commented Jun 18, 2016 •

edited

Loading