break ties in Gambler's Problem #83

hansweytjens · 2018-08-26T12:49:16Z

May I propose following line to substitute for line 54:

policy[state] = actions[np.argmax(np.round(action_returns[1:],5))]

The [1:] avoid choosing the '0' action which doesn't change state nor exptected returns. Since numpy.argmax chooses the first option in case of ties, rounding the near-ties assures the one associated with the smallest action (or bet) is selected. The output of the app now resembles Figure 4.3. in the Sutton/Bartho's book.

PS just signed up for Github, may not be using it optimally yet

ShangtongZhang · 2018-08-26T15:26:01Z

This looks awesome, I just made a commit. Thanks!

ShangtongZhang closed this as completed Aug 26, 2018

barcahead mentioned this issue Sep 17, 2018

action index should offset by one #88

Merged

mhoehle mentioned this issue Mar 3, 2023

Exercise 4.8: Avoid numerical instability in policy LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions#95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

break ties in Gambler's Problem #83

break ties in Gambler's Problem #83

hansweytjens commented Aug 26, 2018

ShangtongZhang commented Aug 26, 2018

break ties in Gambler's Problem #83

break ties in Gambler's Problem #83

Comments

hansweytjens commented Aug 26, 2018

ShangtongZhang commented Aug 26, 2018