Is plain `torch.optim.SGD` doing the same as gradient descent? #1146

shenhai-ran · 2024-12-02T12:34:45Z

shenhai-ran
Dec 2, 2024

Hi,

I am wondering about the implementation of torch.optim.SGD.
If I don't use any batch size or dataloader, as the code snippet below from Chatper 1.
In this case, if I understand it right, there is no stochastic, but calculating native gradient descent for the whole dataset, and the epoch is basically the counter of steps in the optimization.

Do I get it right?

# Set the number of epochs (how many times the model will pass over the training data)
epochs = 100

# Create empty loss lists to track values
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    ### Training

    # Put model in training mode (this is the default state of a model)
    model_0.train()

    # 1. Forward pass on train data using the forward() method inside 
    y_pred = model_0(X_train)
    # print(y_pred)

    # 2. Calculate the loss (how different are our models predictions to the ground truth)
    loss = loss_fn(y_pred, y_train)

    # 3. Zero grad of the optimizer
    optimizer.zero_grad()

    # 4. Loss backwards
    loss.backward()

    # 5. Progress the optimizer
    optimizer.step()

    ### Testing

    # Put the model in evaluation mode
    model_0.eval()

    with torch.inference_mode():
      # 1. Forward pass on test data
      test_pred = model_0(X_test)

      # 2. Caculate loss on test data
      test_loss = loss_fn(test_pred, y_test.type(torch.float)) # predictions come in torch.float datatype, so comparisons need to be done with tensors of the same type

      # Print out what's happening
      if epoch % 10 == 0:
            epoch_count.append(epoch)
            train_loss_values.append(loss.detach().numpy())
            test_loss_values.append(test_loss.detach().numpy())
            print(f"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is plain `torch.optim.SGD` doing the same as gradient descent? #1146

{{title}}

Replies: 0 comments

Select a reply

Is plain torch.optim.SGD doing the same as gradient descent? #1146

shenhai-ran Dec 2, 2024

Replies: 0 comments

Is plain `torch.optim.SGD` doing the same as gradient descent? #1146

shenhai-ran
Dec 2, 2024