-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploration with pseudo counts #34
Comments
My first step is to implement a CTS based probability measure for small bitmaps (with 1 bit pixels) with the location dependent model described in the paper. I will expect reasonable probabilities calculated for patterns that have been processed (1), similar to those (>0.5) and dissimilar (0). |
Good luck! Finally got round to reading the paper and noticed some extras in the appendix. Seems like for completeness we'll need to add a stochastic ALE setting for this paper and the PAL paper, plus remove the terminal signal on life loss for this paper. Looks like that can make a huge difference on the results reported. |
FYI there's another (new) paper from DeepMind with similar goals... |
The paper refers to a number of other papers with regards to CTS usage saying "similar to this and that", but in the end the referred papers do quite different things, best to look at just the method in the pseudo count paper. They also refer to the Skipping CTS paper, but always talk about CTS, so I use the plain CTS for now. Managed to adapt the CTS code to give reasonable probs for 1-bit pixel bitmaps with the neighbour factors in the paper. It's not described exactly how they handle the multiple bits of a single pixel, that could be done in a number of ways (for a single bit look at the same bit in the neighbouring pixels or look at all bits in the neighbouring pixels). I'll add different options for that and provide a native lib and an |
Kind of finished a separate module with the native probability tree for 8 bit screens. Was not easy, but probably now comes the difficult part... for example the probablity of the screen is the product of the probability of the pixels. Different implementations (CTW and CTS) compute slightly different probabilities, but when there are 42 * 42 * 8 factors the probablity product can be quite different (ie. 0.99 vs 0.99999 ^ (42 * 42 * 8)...) Probably one would need to do exactly as DM to make it work... let's try anyway. |
New paper with method that performs well on Montezuma's revenge. Implementation could be used with both DDQN ER and async A3C. The probability used for the pseudo count is computed using Context Tree Switching that could be implemented based on this implementation.
The text was updated successfully, but these errors were encountered: