NaN Issue with SAC Code #7

SaifAlWahaibi · 2023-05-17T08:48:13Z

action = T.tanh(actions)*T.tensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-action.pow(2) + self.reparam_noise) --> produces negative outputs inside the log, which in turn produces nan
log_probs = log_probs.sum(1, keepdim=True)

How can I fix this issue? Are the following modifications correct?

action = T.tanh(actions)*T.tensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-T.tanh(actions).pow(2) + self.reparam_noise)
log_probs = log_probs.sum(1, keepdim=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN Issue with SAC Code #7

NaN Issue with SAC Code #7

SaifAlWahaibi commented May 17, 2023

NaN Issue with SAC Code #7

NaN Issue with SAC Code #7

Comments

SaifAlWahaibi commented May 17, 2023