Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

break ties in Gambler's Problem #83

Closed
hansweytjens opened this issue Aug 26, 2018 · 1 comment
Closed

break ties in Gambler's Problem #83

hansweytjens opened this issue Aug 26, 2018 · 1 comment

Comments

@hansweytjens
Copy link

May I propose following line to substitute for line 54:

policy[state] = actions[np.argmax(np.round(action_returns[1:],5))]

The [1:] avoid choosing the '0' action which doesn't change state nor exptected returns. Since numpy.argmax chooses the first option in case of ties, rounding the near-ties assures the one associated with the smallest action (or bet) is selected. The output of the app now resembles Figure 4.3. in the Sutton/Bartho's book.

PS just signed up for Github, may not be using it optimally yet

@ShangtongZhang
Copy link
Owner

This looks awesome, I just made a commit. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants