You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run Pong Policy Gradient for 2000 episodes on the original file with no results whatsoever. Then boosted reward for positive points (points scored by the learner(right side) to 20 and got this result:
I boosted learner's points rewards to 100 and after around 1500 episodes got a slight improvement, similar to that in the picture. I ran it to 8100 episodes and there was no improvement except for a slightly smaller variance. Forgive my being naive but successfully running three versions of cartpole I was expecting some logical results.
As you can see from the picture variance is big and after a 800-900 improvement the results seem stagnant.
Has anybody run it for some more episodes and tried to tweak the rewards and brought results up and variance down?
Given the policy should I boost the penalty for the teacher's (left opponent's) scoring points?
Any guidance will be appreciated. Thanks.
The text was updated successfully, but these errors were encountered:
I found the reason behind my issue. Convolutional part of the neural net was wrongly defined that's why it converged to a negative result.
Based on my earlier experience with the convolutional networks I changed the following:
model.add(Reshape((1, 80, 80), input_shape=(self.state_size,)))
to
model.add(Reshape((80,80, 1), name="Layer1",input_shape=(self.state_size,)))
and
removed strides in strides=(3, 3) to the default one.
The first change reshaped the network correctly to have 80 by 80 windows not 1 by 80 windows the second change was necessary because without it the network was loosing some information and early converging and not exploring any more.
Now the network looks like this:
and after only 1000 episodes it mostly wins although with a high variance and shows a bias to stay in the lower part of the screen. It either needs more training or redefinition of the act function.
I made some more changes to the structure to speed things up because on my laptop with 1,8 mln weights it was very slow.
TomaszRem
changed the title
Pong Policy Gradient-How many episodes to get result?
Pong Policy Gradient-important error in the definition of the convolutional net.
Apr 5, 2018
I tried to run Pong Policy Gradient for 2000 episodes on the original file with no results whatsoever. Then boosted reward for positive points (points scored by the learner(right side) to 20 and got this result:
I boosted learner's points rewards to 100 and after around 1500 episodes got a slight improvement, similar to that in the picture. I ran it to 8100 episodes and there was no improvement except for a slightly smaller variance. Forgive my being naive but successfully running three versions of cartpole I was expecting some logical results.
As you can see from the picture variance is big and after a 800-900 improvement the results seem stagnant.
Has anybody run it for some more episodes and tried to tweak the rewards and brought results up and variance down?
Given the policy should I boost the penalty for the teacher's (left opponent's) scoring points?
Any guidance will be appreciated. Thanks.
The text was updated successfully, but these errors were encountered: