Wrong implementation of AIRL #2

Ericonaldo · 2020-10-01T07:33:50Z

I check the code and I wonder if you implement AIRL simply by changing the reward function as the disc logit? This is different from the original paper where they use a disentangled discriminator which is computed by f / f + \pi where f is an approximation of "exp(r)" and \pi is the policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong implementation of AIRL #2

Wrong implementation of AIRL #2

Ericonaldo commented Oct 1, 2020

Wrong implementation of AIRL #2

Wrong implementation of AIRL #2

Comments

Ericonaldo commented Oct 1, 2020