Importance Sampling #3

VAndreasV · 2018-07-23T13:44:00Z

First of all, thanks for this implementation.

One question however: I see that the weights for the importance sampling are calculated and returned when a batch is sampled.
However, the weights aren't used further in the code.

Is this some legacy code from a feature that didn't work? Or is the code not finished yet?

Andreas

stormont · 2019-02-21T01:57:53Z

~~I think it's just returned for debugging. You could use them to see what kind of weightings each of your samples have.~~

Nope, you're totally right! It's a bug. According to the original paper, you should multiply the TD error by the IS weights.

You can do this in the repo code by changing the loss calculation:

loss = (torch.FloatTensor(is_weights) * F.mse_loss(pred, target)).mean()

I'm not sure if the weighting should go inside the MSE calculation or can be applied outside, like I've done. By that, I mean I don't know if the weights should inside the square operation or not. I think outside the MSE is the correct place.

being-aerys · 2020-12-02T01:06:00Z

I think each sample in a minibatch of losses should be multiplied by the corresponding IS weights. However, torch by default returns an averaged MSE loss over the minibatch. Hence, to multiply each loss value in a batch by its corresponding weight, the MSE loss should be calculated by setting reduce flag in the torch.nn.functional.mse_loss() to False Also, I like to calculate the loss values in a minibatch by passing the targets and the predictions in the format [ [loss for sample1], [loss for sample 2], .... [loss for sample n] ] instead of [loss for sample 1, loss for sample 2,........., loss for sample n]. Not sure if the second format is a correct method or not. After multiplication, the batch of this weighted loss can then be averaged using tensor.mean() and then you can call backward() on this.

ChampagneAndfragrance · 2022-07-15T01:20:42Z

In the original paper, the loss is also multiplied by the TD-error. Seems if we calculate the gradient of your loss, there is no TD-error as a factor in the coefficient? @stormont

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importance Sampling #3

Importance Sampling #3

VAndreasV commented Jul 23, 2018

stormont commented Feb 21, 2019 •

edited

Loading

being-aerys commented Dec 2, 2020

ChampagneAndfragrance commented Jul 15, 2022

Importance Sampling #3

Importance Sampling #3

Comments

VAndreasV commented Jul 23, 2018

stormont commented Feb 21, 2019 • edited Loading

being-aerys commented Dec 2, 2020

ChampagneAndfragrance commented Jul 15, 2022

stormont commented Feb 21, 2019 •

edited

Loading