Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploration with pseudo counts #34

Open
lake4790k opened this issue Jun 8, 2016 · 5 comments
Open

Exploration with pseudo counts #34

lake4790k opened this issue Jun 8, 2016 · 5 comments
Assignees

Comments

@lake4790k
Copy link
Collaborator

lake4790k commented Jun 8, 2016

New paper with method that performs well on Montezuma's revenge. Implementation could be used with both DDQN ER and async A3C. The probability used for the pseudo count is computed using Context Tree Switching that could be implemented based on this implementation.

@lake4790k
Copy link
Collaborator Author

My first step is to implement a CTS based probability measure for small bitmaps (with 1 bit pixels) with the location dependent model described in the paper. I will expect reasonable probabilities calculated for patterns that have been processed (1), similar to those (>0.5) and dissimilar (0).

@Kaixhin
Copy link
Owner

Kaixhin commented Jun 16, 2016

Good luck! Finally got round to reading the paper and noticed some extras in the appendix. Seems like for completeness we'll need to add a stochastic ALE setting for this paper and the PAL paper, plus remove the terminal signal on life loss for this paper. Looks like that can make a huge difference on the results reported.

@Kaixhin
Copy link
Owner

Kaixhin commented Jun 16, 2016

FYI there's another (new) paper from DeepMind with similar goals...

@lake4790k
Copy link
Collaborator Author

lake4790k commented Jun 17, 2016

The paper refers to a number of other papers with regards to CTS usage saying "similar to this and that", but in the end the referred papers do quite different things, best to look at just the method in the pseudo count paper. They also refer to the Skipping CTS paper, but always talk about CTS, so I use the plain CTS for now.

Managed to adapt the CTS code to give reasonable probs for 1-bit pixel bitmaps with the neighbour factors in the paper. It's not described exactly how they handle the multiple bits of a single pixel, that could be done in a number of ways (for a single bit look at the same bit in the neighbouring pixels or look at all bits in the neighbouring pixels). I'll add different options for that and provide a native lib and an ffi interface that could be invoked in ER and async to compute the pseudo counts from the probabilities.

@lake4790k lake4790k self-assigned this Jun 17, 2016
@lake4790k
Copy link
Collaborator Author

lake4790k commented Jun 18, 2016

Kind of finished a separate module with the native probability tree for 8 bit screens. Was not easy, but probably now comes the difficult part... for example the probablity of the screen is the product of the probability of the pixels. Different implementations (CTW and CTS) compute slightly different probabilities, but when there are 42 * 42 * 8 factors the probablity product can be quite different (ie. 0.99 vs 0.99999 ^ (42 * 42 * 8)...) Probably one would need to do exactly as DM to make it work... let's try anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants