You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
why are you using action_results[0]? As you said, the generator generates the next state and reward, which one is the state and which one is the reward? since it is a cart-pole problem, shouldn't there be 4 values as the state? I know these questions might seem stupid but I am having a hard time understanding this.
Thanks,
Abodh
The text was updated successfully, but these errors were encountered:
Hello Aaron,
Can you show the relation of the above part of the code with the original formula for loss used in the paper?
also, I have another question:
why are you using action_results[0]? As you said, the generator generates the next state and reward, which one is the state and which one is the reward? since it is a cart-pole problem, shouldn't there be 4 values as the state? I know these questions might seem stupid but I am having a hard time understanding this.
Thanks,
Abodh
The text was updated successfully, but these errors were encountered: