You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As written in ShangtongZhang/reinforcement-learning-an-introduction#83 this would result in a more deterministic action choice in situations, where several actions give identical value (which due to floating point errors are not identical). At least when I added the suggested rounding the produced figure(s) resembled Fig. 4.3. more:
This removes the artefacts seen in the original figure produced by the code (Ex4.9_plotB.jpg in the repo or see below):
P.S. The number of digits might have to be increased such that it also works for the $p_h=0.55$ example in the code (i.e. Figure Ex4.9_plotD.jpg), e.g. digits=12.
The text was updated successfully, but these errors were encountered:
mhoehle
changed the title
Exercise 4.8: Avoid numerical instability in value function
Exercise 4.8: Avoid numerical instability in policy
Mar 3, 2023
Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/Chapter 4/Ex4.9.py
Line 25 in 68b023b
Suggested replacement:
As written in ShangtongZhang/reinforcement-learning-an-introduction#83 this would result in a more deterministic action choice in situations, where several actions give identical value (which due to floating point errors are not identical). At least when I added the suggested rounding the produced figure(s) resembled Fig. 4.3. more:
This removes the artefacts seen in the original figure produced by the code (Ex4.9_plotB.jpg in the repo or see below):
P.S. The number of digits might have to be increased such that it also works for the$p_h=0.55$ example in the code (i.e. Figure Ex4.9_plotD.jpg), e.g.
digits=12
.The text was updated successfully, but these errors were encountered: