You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think there is something wrong with the implementation. The optimal policy should be 1 for all states when p_h = 0.55. There is a big bet around 80 and I cannot find any reason for this behavior.
The text was updated successfully, but these errors were encountered:
Does anybody know why this happens, I get similar results with this implementation and my own. Is this because it does not result in a stable solution?
Does anybody know why this happens, I get similar results with this implementation and my own. Is this because it does not result in a stable solution?
Due to the limited accuracy, it is not reliable to compare equality, some value is actually the same. So we should limit the precision when we compare the value of q(s,a)——> just change the part of "np.argmax"
I think there is something wrong with the implementation. The optimal policy should be 1 for all states when p_h = 0.55. There is a big bet around 80 and I cannot find any reason for this behavior.
The text was updated successfully, but these errors were encountered: