You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The [1:] avoid choosing the '0' action which doesn't change state nor exptected returns. Since numpy.argmax chooses the first option in case of ties, rounding the near-ties assures the one associated with the smallest action (or bet) is selected. The output of the app now resembles Figure 4.3. in the Sutton/Bartho's book.
PS just signed up for Github, may not be using it optimally yet
The text was updated successfully, but these errors were encountered:
May I propose following line to substitute for line 54:
policy[state] = actions[np.argmax(np.round(action_returns[1:],5))]
The [1:] avoid choosing the '0' action which doesn't change state nor exptected returns. Since numpy.argmax chooses the first option in case of ties, rounding the near-ties assures the one associated with the smallest action (or bet) is selected. The output of the app now resembles Figure 4.3. in the Sutton/Bartho's book.
PS just signed up for Github, may not be using it optimally yet
The text was updated successfully, but these errors were encountered: