Nan loss #1068
Nan loss
#1068
-
I am getting nan loss from the first epoch. import numpy as np
from torch import nn
import torch
from skorch import NeuralNetRegressor
X = torch.arange(0, 1_000, 1, dtype=torch.float32).reshape(-1, 1)
m = 2
y = m*X
class linearRegression(torch.nn.Module):
def __init__(self, inputSize, outputSize):
super(linearRegression, self).__init__()
self.linear = torch.nn.Linear(inputSize, outputSize)
def forward(self, x):
return self.linear(x)
net = NeuralNetRegressor(
linearRegression,
module__inputSize=1,
module__outputSize=1,
max_epochs=10,
)
net.fit(X, y)
|
Beta Was this translation helpful? Give feedback.
Answered by
BenjaminBossan
Oct 24, 2024
Replies: 1 comment 4 replies
-
Even if I do a near-perfect initialization, the loss comes out to be nan. Is this a bug in PyTorch and not an error from my side? import numpy as np
from torch import nn
import torch
from skorch import NeuralNetRegressor
X = torch.arange(0, 1_000, 1, dtype=torch.float32).reshape(-1, 1)
m = 1
y = m*X
class linearRegression(torch.nn.Module):
def __init__(self, inputSize, outputSize):
super(linearRegression, self).__init__()
self.linear = nn.Linear(inputSize, outputSize) # (torch.ones(size=(inputSize, outputSize), requires_grad=True))
with torch.no_grad(): # Prevents gradient tracking
self.linear.weight.data = torch.tensor([[0.999999]])
self.linear.bias.data = torch.tensor([0.0])
def forward(self, x):
return self.linear(x)
net = NeuralNetRegressor(
linearRegression,
module__inputSize=1,
module__outputSize=1,
max_epochs=3,
criterion=nn.MSELoss(),
train_split=None,
lr = 3e-4,
)
net.fit(X, y)
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The reason why you're observing this is floating point arithmetic. Even though mathematically, the net's output should exactly correspond to
y
, in practice there are small differences. E.g. if the target is 127., the prediction is 126.9999. Given these small differences, the loss is non-zero, thus the parameters are changed a little bit. Normally, this should reduce the error but as the learning rate is too big, it actually increases, making this difference bigger and bigger after each update. If you change the learning rate to something smaller, like 1e-6 or 1e-7, you won't see this diverging behavior.Note that regression uses MSE loss by default, which can be very sensitive to outliers…