Reward shaping not removed in evaluation in CarRacing-From-Pixels-PPO #3

lerrytang · 2020-03-17T00:45:06Z

Hi,

The figure and log in README shows scores >1000, which due to the CarRacing's design, is not quite possible.
It turns out that the reward shaping in Wrapper.step() is not removed in evaluation and that leads to incorrect results.
Commenting out relevant lines, I got an average score of 820 over 100 episodes.

The text was updated successfully, but these errors were encountered:

Rafael1s · 2020-03-17T09:09:29Z

Thanks for your note,
Could you info that the relevant lines should be commented out or your version of Wrapper.step()

lerrytang · 2020-03-17T13:00:07Z

Hi,
The following snippet is what I used for evaluation, let me know if this makes sense to you :)

    def step(self, action):
        total_reward = 0
        for i in range(action_repeat):
            img_rgb, reward, die, _ = env.step(action)
            # don't penalize "die state"
#            if die:
#                reward += 100
            # green penalty
#            if np.mean(img_rgb[:, :, 1]) > 185.0:
#                reward -= 0.05
            total_reward += reward
            # if no reward recently, end the episode
#            done = True if self.av_r(reward) <= -0.1 else False
            done = False
            if done or die:
                break
        img_gray = rgb2gray(img_rgb)
        self.stack.pop(0)
        self.stack.append(img_gray)
        assert len(self.stack) == img_stack
        return np.array(self.stack), total_reward, done, die

Rafael1s · 2020-03-17T15:27:01Z

Hi,
I cannot agree with your version. For example, where your "green penalty"? You need penalize the car driving to the green field. Possibly, the green threshold should be lower than 185, or reward should be more accurate than -0.05.

lerrytang · 2020-03-18T02:54:27Z

Let's see if the followings can describe my points better.

There are 2 versions of Wrapper.step(), 1 for training and 1 for evaluation.
You can add whatever reward shaping in the training version. E.g., penalty for driving to grass.
You should not add reward shaping in the evaluation version. E.g., CarRacing is considered solved when avg reward > 900, but it is not very fair if you add 100 upon die==True or end the episode earlier if you notice the car is not running well, right?

The code snippet I used was for evaluation.

Rafael1s · 2020-03-18T22:42:59Z

@lerrytang
Let us look at the OpenAI CarRacing environment code
https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py
lines 337-339

If the car went outward the field, the reward is penalized by -100.
However, if the track is over (absolutely successful case) the reward ALSO penalized by -100
Then, for fairness, we restore the reward by +100 in Wrapper.step()

Rafael1s closed this as completed Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward shaping not removed in evaluation in CarRacing-From-Pixels-PPO #3

Reward shaping not removed in evaluation in CarRacing-From-Pixels-PPO #3

lerrytang commented Mar 17, 2020

Rafael1s commented Mar 17, 2020

lerrytang commented Mar 17, 2020

Rafael1s commented Mar 17, 2020

lerrytang commented Mar 18, 2020

Rafael1s commented Mar 18, 2020

Reward shaping not removed in evaluation in CarRacing-From-Pixels-PPO #3

Reward shaping not removed in evaluation in CarRacing-From-Pixels-PPO #3

Comments

lerrytang commented Mar 17, 2020

Rafael1s commented Mar 17, 2020

lerrytang commented Mar 17, 2020

Rafael1s commented Mar 17, 2020

lerrytang commented Mar 18, 2020

Rafael1s commented Mar 18, 2020