Exercise 4.8: Avoid numerical instability in policy #95

mhoehle · 2023-03-03T23:27:31Z

Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/Chapter 4/Ex4.9.py

Line 25 in 68b023b

op_a = np.argmax(v)

Suggested replacement:

op_a = np.argmax(np.round(v, decimals=5))

As written in ShangtongZhang/reinforcement-learning-an-introduction#83 this would result in a more deterministic action choice in situations, where several actions give identical value (which due to floating point errors are not identical). At least when I added the suggested rounding the produced figure(s) resembled Fig. 4.3. more:

This removes the artefacts seen in the original figure produced by the code (Ex4.9_plotB.jpg in the repo or see below):

P.S. The number of digits might have to be increased such that it also works for the $p_h=0.55$ example in the code (i.e. Figure Ex4.9_plotD.jpg), e.g. digits=12.

The text was updated successfully, but these errors were encountered:

mhoehle changed the title ~~Exercise 4.8: Avoid numerical instability in value function~~ Exercise 4.8: Avoid numerical instability in policy Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise 4.8: Avoid numerical instability in policy #95

Exercise 4.8: Avoid numerical instability in policy #95

mhoehle commented Mar 3, 2023 •

edited

Loading

Exercise 4.8: Avoid numerical instability in policy #95

Exercise 4.8: Avoid numerical instability in policy #95

Comments

mhoehle commented Mar 3, 2023 • edited Loading

mhoehle commented Mar 3, 2023 •

edited

Loading