Are you actually using the learned intrinsic reward for the agent? #9

ferreirafabio · 2021-02-20T16:47:54Z

Hi,

I can only see that you optimize the intrinsic loss in your code. Can you point me to the line where you add the intrinsic rewards to the actual environment/extrinsic rewards?

In some areas of your code I can see comments like
# total reward = int reward
which would, according to the original paper, be wrong, no?

Thank you.

The text was updated successfully, but these errors were encountered:

ruoshiliu · 2021-03-03T18:26:29Z

Also new to the repo, but here the loss is composed of both intrinsic and extrinsic reward:

curiosity-driven-exploration-pytorch/agents.py

Line 144 in bacbefd

    
           loss = (actor_loss + 0.5 * critic_loss - 0.001 * entropy) + forward_loss + inverse_loss

ferreirafabio · 2021-03-03T18:30:45Z

Thanks @ruoshiliu. Yes, I saw the loss. But in addition to optimizing the loss you also need to use the intrinsic rewards (which is the result from optimizing its loss) for the agent as stated in the paper. Only optimizing the loss is not equivalent to using the intrinsic reward as an outcome of optimizing its loss.

ruoshiliu · 2021-03-04T01:11:44Z

@ferreirafabio What do you mean by "use the intrinsic rewards"? Can you point out which section in the paper stated that?

ferreirafabio · 2021-03-04T06:59:29Z

By that I mean reward = extrinsic reward + intrinsic reward. From the paper:

I now realize that the paper says the extrinsic reward can be optional. Wondering what is „usually“ used (with or without extrinsic reward) when peers use ICM as a baseline.

ruoshiliu · 2021-03-04T22:22:59Z

Thank you for the clarification. Let me make sure I understand your question. What you are saying is the code (referenced above) tries to minimize the loss function by maximizing the extrinsic reward and minimizing the intrinsic reward. The correct implementation should reflect the equation (7) below in which

In other words, the correct implementation should find the policy p that maximizes both intrinsic and extrinsic reward and parameters for inverse model and forward model that minimizes L_I and L_F.

Did I interpret your question correctly?

ferreirafabio · 2021-03-04T22:49:01Z

Yes

ferreirafabio changed the title ~~Are you optimizing over the sum of extrinsic and intrinsic rewards?~~ Are you actually using the learned intrinsic reward for the agent? Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are you actually using the learned intrinsic reward for the agent? #9

Are you actually using the learned intrinsic reward for the agent? #9

ferreirafabio commented Feb 20, 2021 •

edited

Loading

ruoshiliu commented Mar 3, 2021

ferreirafabio commented Mar 3, 2021 •

edited

Loading

ruoshiliu commented Mar 4, 2021 •

edited

Loading

ferreirafabio commented Mar 4, 2021 •

edited

Loading

ruoshiliu commented Mar 4, 2021 •

edited

Loading

ferreirafabio commented Mar 4, 2021

Are you actually using the learned intrinsic reward for the agent? #9

Are you actually using the learned intrinsic reward for the agent? #9

Comments

ferreirafabio commented Feb 20, 2021 • edited Loading

ruoshiliu commented Mar 3, 2021

ferreirafabio commented Mar 3, 2021 • edited Loading

ruoshiliu commented Mar 4, 2021 • edited Loading

ferreirafabio commented Mar 4, 2021 • edited Loading

ruoshiliu commented Mar 4, 2021 • edited Loading

ferreirafabio commented Mar 4, 2021

ferreirafabio commented Feb 20, 2021 •

edited

Loading

ferreirafabio commented Mar 3, 2021 •

edited

Loading

ruoshiliu commented Mar 4, 2021 •

edited

Loading

ferreirafabio commented Mar 4, 2021 •

edited

Loading

ruoshiliu commented Mar 4, 2021 •

edited

Loading