You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I check the code and I wonder if you implement AIRL simply by changing the reward function as the disc logit? This is different from the original paper where they use a disentangled discriminator which is computed by f / f + \pi where f is an approximation of "exp(r)" and \pi is the policy.
The text was updated successfully, but these errors were encountered:
I check the code and I wonder if you implement AIRL simply by changing the reward function as the disc logit? This is different from the original paper where they use a disentangled discriminator which is computed by f / f + \pi where f is an approximation of "exp(r)" and \pi is the policy.
The text was updated successfully, but these errors were encountered: