-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max Hyperparameters displayed #6965
Comments
AFAIK, there is no limit. The app initially should load up to 1000 parameters. There's also a "default" sample size per plugin (but is also not 30), which can be overriden by passing a See this section and this section in our README. I would suggest to check if you are setting this argument somewhere, and/or else try to validate that you are indeed writing all of the data that you expect. |
Thanks @arcra , I did confirm that my issue is not a hardcoded max 30 hyperparameters. Instead it seems tensorboard only looks at the number of hyperparameters of the first run in the log-directory, and keeps to that number as its max value. I have created a Minimal Reproducable example, and you can check if you run this you will only see 13 hparams appear, instead of 23. If you change the order of I dig a bit into the codebase but I couldn't find where this happens. Would be helpful to get your insights on that. import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter
import numpy as np
import random
import string
import datetime
# Define some initial hyperparameters
epochs = 10
batch_size = 32
lr = 0.001
# Create a dictionary for hyperparameters
hyperparams = {
"epochs": epochs,
"batch_size": batch_size,
"lr": lr,
}
for n_hparams in [10, 20]:
for i in range(n_hparams):
key = ''.join(random.choices(string.ascii_uppercase + string.digits, k=8))
value = random.random()
hyperparams[key] = value
# Create a log directory
now = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
log_dir = f"/logs/fit/{now}"
writer = SummaryWriter(log_dir)
writer.add_hparams(hyperparams, {})
# Create a simple model
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc1 = nn.Linear(10, 64)
self.fc2 = nn.Linear(64, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
model = SimpleModel()
# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
# Generate toy data
x_train = torch.randn(1000, 10)
y_train = torch.randn(1000, 1)
x_test = torch.randn(200, 10)
y_test = torch.randn(200, 1)
# Train the model
for epoch in range(epochs):
model.train()
optimizer.zero_grad()
outputs = model(x_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()
# Log training loss
writer.add_scalar('Loss/train', loss.item(), epoch)
# Validate the model
model.eval()
with torch.no_grad():
val_outputs = model(x_test)
val_loss = criterion(val_outputs, y_test)
# Log validation loss
writer.add_scalar('Loss/val', val_loss.item(), epoch)
writer.close()
|
I was able to reproduce (btw, you don't need to actually use a model and train it, it's enough to open the summary writer, write the hparams and close it). I tracked down the issue to an assumption in the code here. As described in pytorch's documentation for In the case of this repro, when the parameters are written in the first loop, there are only 13 parameters in the Now, the code for the hparams plugin was written a while ago, so I don't have context on why this assumption was there. It's possible that this is true for the TensorFlow implementation, and pytorch is just doing something different. It's also possible that it never considered a case like this, or else, perhaps simply that the assumption no longer holds true for some reason. I'm not sure how we'll want to address this (as I mentioned this was written a while ago, and it might take a while to look into what should be done). Not to mention we have holidays in the next couple of weeks. Thanks for reporting, tho. If you feel like you'd like to contribute to this, feel free to let me know, and we can accept contributions. Another note is that I noticed that if I had installed the |
As a follow-up, looking at the documentation for the "summary" implementation from TB's plugin here and here, it does seem like this is related to how the implementation from pytorch writes the data. I suppose the case in which there are different sets of hparams (names, not values) for different runs might be somewhat uncommon. I honestly don't know how common that is. |
Thanks @arcra for taking a look at this, indeed the code line you mentioned is the source of the issue. I might spend some time to see if I find a clean workaround, otherwise if no-one takes this up soon and other people have the same issue, I would recommend to just change the timestamp of the run that has the superset of the |
I was wondering if there is a maximum value of hyperparameters I can track / see in the UI. Currently I am tracking 35 hparams but in the UI only the first 30 are included (see screenshot). Those extra 5 are also not shown in the
HPARAMS
tab.I am using the default
tb.add_hparams(hparams, metrics)
The text was updated successfully, but these errors were encountered: