Pong Policy Gradient-important error in the definition of the convolutional net. #79

TomaszRem · 2018-04-01T18:20:24Z

I tried to run Pong Policy Gradient for 2000 episodes on the original file with no results whatsoever. Then boosted reward for positive points (points scored by the learner(right side) to 20 and got this result:

I boosted learner's points rewards to 100 and after around 1500 episodes got a slight improvement, similar to that in the picture. I ran it to 8100 episodes and there was no improvement except for a slightly smaller variance. Forgive my being naive but successfully running three versions of cartpole I was expecting some logical results.
As you can see from the picture variance is big and after a 800-900 improvement the results seem stagnant.
Has anybody run it for some more episodes and tried to tweak the rewards and brought results up and variance down?
Given the policy should I boost the penalty for the teacher's (left opponent's) scoring points?
Any guidance will be appreciated. Thanks.

TomaszRem · 2018-04-05T06:51:48Z

I found the reason behind my issue. Convolutional part of the neural net was wrongly defined that's why it converged to a negative result.
Based on my earlier experience with the convolutional networks I changed the following:
model.add(Reshape((1, 80, 80), input_shape=(self.state_size,)))
to
model.add(Reshape((80,80, 1), name="Layer1",input_shape=(self.state_size,)))
and
removed strides in strides=(3, 3) to the default one.
The first change reshaped the network correctly to have 80 by 80 windows not 1 by 80 windows the second change was necessary because without it the network was loosing some information and early converging and not exploring any more.
Now the network looks like this:

and after only 1000 episodes it mostly wins although with a high variance and shows a bias to stay in the lower part of the screen. It either needs more training or redefinition of the act function.

I made some more changes to the structure to speed things up because on my laptop with 1,8 mln weights it was very slow.

TomaszRem closed this as completed Apr 1, 2018

TomaszRem changed the title ~~Pong Policy Gradient-ow many episodes to get results?~~ Pong Policy Gradient-How many episodes to get result? Apr 1, 2018

TomaszRem reopened this Apr 2, 2018

TomaszRem closed this as completed Apr 5, 2018

TomaszRem changed the title ~~Pong Policy Gradient-How many episodes to get result?~~ Pong Policy Gradient-important error in the definition of the convolutional net. Apr 5, 2018

TomaszRem reopened this Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pong Policy Gradient-important error in the definition of the convolutional net. #79

Pong Policy Gradient-important error in the definition of the convolutional net. #79

TomaszRem commented Apr 1, 2018 •

edited

Loading

TomaszRem commented Apr 5, 2018 •

edited

Loading

Pong Policy Gradient-important error in the definition of the convolutional net. #79

Pong Policy Gradient-important error in the definition of the convolutional net. #79

Comments

TomaszRem commented Apr 1, 2018 • edited Loading

TomaszRem commented Apr 5, 2018 • edited Loading

TomaszRem commented Apr 1, 2018 •

edited

Loading

TomaszRem commented Apr 5, 2018 •

edited

Loading