-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurrent Dqn #8
Comments
This is the Caffe implementation from the paper: Altough Caffe I never looked at probably will help. |
@Kaixhin I see you started working on this, cool. I'll have some time now, so I'll look at the multigpu and async modes. |
@lake4790k Almost have something working. Disabling this line lets the DRQN train, as otherwise it crashes here, somehow propagating a batch of size 20 forward but expecting the normal batch size of 32 backwards. I'm new to the |
@Kaixhin Awesome! I have no experience with |
@lake4790k I'd have to delve into the original paper/code, but it looks like they train the network every step (as opposed to every 4). This seems like it'll be a problem for BPTT. In any case if you haven't used |
@Kaixhin cool, I'll have my hands full with async for now, but in the meantime I'll be able to help with running longer rdqn experiments on my workstation when you think it's worth trying. |
Here's the result of running |
Pinging @JoostvDoorn since he's contributed to |
@Kaixhin I will have a look later. |
@Kaixhin I'm not getting the error you mentioned when doing validation on the last batch with size 20 when running |
I think this is in the |
ok so there are two issues:
|
|
@Kaixhin re: 2. agree, this error is bad, so returning before is not a solution. I'm not sure if learning is bad with the normal batch sizes, could be only not handling a batch size change somewhere properly. I tried an isolated |
@lake4790k I also tried a simple |
@Kaixhin I need to refresh |
@lake4790k I'd go with a merge since it preserves history correctly. It's better to make sure all the changes in |
Done the merge and added Agent sees only the latest frame per step and backpropagates with unrolling 5 steps on every step, weights are updated every 5 (or terminal) steps, no Pretty cool that is works, I'll try if it performs similar with a flickering catch as they did in the paper with the flickering pong. Also in the async paper they added a half size LSTM layer after the linear instead of replacing it, will try that as well (although the DRQN paper says replacing is the best). Will add support for the n-step methods as well, there it's a bit trickier to get right as there are steps taken forwards and backwards to calculate n-step returns, will have to take care that forwards/backwards are correct for LSTM as well. |
Also tried replacing |
@lake4790k Do you have the flickering catch version somewhere? |
@JoostvDoorn haven't got around to it since, but probably takes a few lines to add to rlenvs. |
@JoostvDoorn I can add that to |
@JoostvDoorn Done. Just get the latest version of |
@Kaixhin Great thanks. Have you tried storing the state instead of calling forget for every time step? I am doing this now, however it takes longer to train but it will probably converge. I agree this has to do with the changing state distribution, but we cannot really let the agent explore without considering the history to take full advantage of the LSTM. |
@JoostvDoorn I thought that this line would actually set In summy, in |
@Kaixhin Yes that line is enough, I will change that in my pull request. I missed |
@JoostvDoorn Yep |
I guess like this; forget is called at the first time step so the LSTM will not have accumulated any information at this point, once here it will start accumulating state information (note however on |
Thanks for spotting the bug. @lake4790k please check 626712b to make sure async agents are accounted for as well. @JoostvDoorn If I understand correctly then the issue is that the agent can't retain information during training because |
One central element of the Atari DQN is the use of 4 consecutive frames as input making the state more Markov, ie. having the vital dynamic movement information. This paper http://arxiv.org/abs/1507.06527v3 discusses DRQN: the multiframe input can be substituted with LSTM with the same effect (but no systematic advantage for one or the other). Also the Deepmind async paper mentions using LSTM instead of multi frame inputs for more challenging visual domains (Torcs and Labyrinth).
I think this would fit well in this codebase, I'll try to contribute this at one point.
The text was updated successfully, but these errors were encountered: