-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to properly load checkpoint for testing? #924
Comments
Is this the |
@awaelchli, here is my
I've also put my code into a Colab, if you would like to run it: https://colab.research.google.com/drive/1JUQctrkFKMJSPU2u5061534HzF27WFRE (the relevant part is in the first half of the notebook, up to 'For checking purposes'; the rest are messy and irrelevant, apologies) Am I doing something wrong here, causing the accuracy returned to be different when I load the checkpoints separately instead of doing |
Change Note that load function is a class method. It will load the exact same hyperparameters from the checkpoint that you used for training. |
Thanks, @awaelchli! Amended my code based on your suggestion and noted a strange behavior: during |
Your notebook is quite big. It would be good if you could try to create a minimal example where this behavior can be observed, then it is easier for us to help. The difference in accuracy you report is quite small. Could it simply be due to randomization of the data? To make sure this doesn't happen, make the code deterministic: https://pytorch.org/docs/stable/notes/randomness.html |
I'm loading my checkpoints for inference as per what @awaelchli suggested above:
However, I am now getting an error:
Even with Am I doing something wrong here? |
I think the example shown in Checkpoint Loading only works if the user is loading the checkpoints immediately after training (i.e. the instance of However, I'm loading the checkpoints separately (i.e. not immediately after training), and I had to change
to
in order to get the hyperparams to be loaded properly. |
your checkpoint was saved with a dictionary. means you likely gave the model a dictionary in init and not a namespace |
@williamFalcon Could it be that this line is actually failing to convert the dictionary built by lightning back to a namespace. In particular, I believe that is happening to me because my checkpoint has no value for "hparams_type" which means that Possible work-arounds:
if isinstance(hparams, dict):
hparams = argparse.Namespace(**hparams) (fwiw: I may be doing other things wrong, because my |
I think I am facing a similiar issue. When I test a model which I load from a checkpoint file the results are quite bad. TransferLearningModel.py
training_testing.py
Any idea what I might be doing from? |
I still face this problem in 0.8.5. I can't restore my LightningModule. |
Even though I saved self.hparams as a Namespace, I found out Lightning loaded hparams back in as a dictionary. I discovered this through stepping through my debugger. Does this behavior still happen in 0.9+? I resorted to the suggested above:
|
No, as far as I know we work around that by saving the hparams type also into the checkpoint so we know when we should convert it back to namespace when loading. |
I also have the same problem. During initializing the training a |
I have again verified that on the latest version it works. Please install 0.9.0rc12. If the problem persists, may I ask for a minimal code sample to reproduce the issue? |
Hello! @awaelchli Facing the same issue on 0.9.0rc16. I mean hparams during training is a Namespace, but is passed in as a dict when loading from checkpoint. I'll post a complete sample shortly, but as far as I can tell, my checkpoint doesn't have the key
yields the following:
For reference, the checkpoint was saved using automatic checkpointing by specifying the following in
|
Did some snooping around in source, and I think this is the relevant snippet in
I haven't been able to verify if the |
Me too, I confirm I still get this on 0.9.0. Just using argparse Namespace. Please reopen. |
0.9.0 stills a problem.. |
Here is my pretty dirty solution to load this checkpoint left by ModelCheckpoint callback:
|
Still having this issue in 0.9.0. The issue comes from here: In fact, the hparams saved in checkpoint is indeed a Namespace. However, the I think this bug has to be fixed with very high priority since in fact loading trained models is one of the most basic things people need for any DL projects. |
@magic282 that's great to hear, at least it's reproducible then, I think I also found that the type didn't seem to be saved in my checkpoint in the answer I'm quoting. Could you verify if the OMEGA_CONF check of the isinstance check failed in your case as I mentioned in my comment above? (referencing here for convenience)
|
This issue is still there on 0.10.0. |
looks like this still exists in 1.0.2 |
I am also facing the same problem. I gave a post in the forum here. Updated: I have solved my problem. You can search in the link (above) |
Could this have the "bug" label added? Seems like unintended behavior. |
@andrewjong Ok, I'm adding the bug label, but I read the thread again, and I still cannot identify what the issue is here, sorry.
|
@awaelchli sorry for being unclear; the specific issue is the hparams bug in #3998 yes. Looks like that got added in the past two weeks. Thanks for linking to it. We should probably focus on that one as it's clearly about the hparams loading issue. |
ok, thank you! Let's close this one here as I believe we have answered and resolved the issues of the main author about how to load the checkpoint and we also get the expected test values. I will get back to the linked issue and try to resolve it. |
@awaelchli hey! I opened and diagnosed #3998, and if you follow the discussion there I've linked to exactly why that's happening, which is that the linked code parses the class spec to find ".hparams=", which would fail for inherited classes. So in some sense I've solved and demonstrated why it's happening. As such that issue is related to The current issue has nothing to do with it. If you look at my comment above in this thread, I've linked to the issue being that the Please reopen this issue since the other one is a distinct one. I've taken the time to dig in and isolate where both are happening and they aren't the same. I understand it's hard to read entire comment threads, but it's also a little frustrating when issues get closed like this after trying to help isolate the problem :) |
import os
from argparse import Namespace
import torch
from torch.utils.data import Dataset
from pytorch_lightning import Trainer, LightningModule
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
class BoringModel(LightningModule):
def __init__(self, hparams):
super().__init__()
self.hparams = hparams
self.layer = torch.nn.Linear(32, 2)
print("type of hparams", type(self.hparams))
print("class of hparams type", self.hparams.__class__.__name__)
print("accessing hparams", self.hparams.something, "no problem")
def forward(self, x):
return self.layer(x)
def loss(self, batch, prediction):
# An arbitrary loss to have a loss that updates the model weights during `Trainer.fit` calls
return torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))
def step(self, x):
x = self.layer(x)
out = torch.nn.functional.mse_loss(x, torch.ones_like(x))
return out
def training_step(self, batch, batch_idx):
output = self.layer(batch)
loss = self.loss(batch, output)
return {"loss": loss}
def training_step_end(self, training_step_outputs):
return training_step_outputs
def training_epoch_end(self, outputs) -> None:
torch.stack([x["loss"] for x in outputs]).mean()
def validation_step(self, batch, batch_idx):
output = self.layer(batch)
loss = self.loss(batch, output)
return {"x": loss}
def validation_epoch_end(self, outputs) -> None:
torch.stack([x['x'] for x in outputs]).mean()
def test_step(self, batch, batch_idx):
output = self.layer(batch)
loss = self.loss(batch, output)
return {"y": loss}
def test_epoch_end(self, outputs) -> None:
torch.stack([x["y"] for x in outputs]).mean()
def configure_optimizers(self):
optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
return [optimizer], [lr_scheduler]
def run_test():
# fake data
train_data = torch.utils.data.DataLoader(RandomDataset(32, 64))
val_data = torch.utils.data.DataLoader(RandomDataset(32, 64))
test_data = torch.utils.data.DataLoader(RandomDataset(32, 64))
# model
model = BoringModel(hparams=Namespace(something=1))
trainer = Trainer(
default_root_dir=os.getcwd(),
limit_train_batches=1,
limit_val_batches=1,
max_epochs=1,
weights_summary=None,
)
trainer.fit(model, train_data, val_data)
# Try to load with hparams
ckpt_path = trainer.checkpoint_callback.best_model_path
model = BoringModel.load_from_checkpoint(ckpt_path)
ckpt = torch.load(ckpt_path)
print(ckpt.keys())
print("hparams name saved to ckpt:", ckpt["hparams_name"])
print("hparams contents saved to ckpt:", ckpt["hyper_parameters"])
print("hparams type saved to ckpt:", type(ckpt["hyper_parameters"]))
trainer.test(model, test_dataloaders=test_data)
if __name__ == '__main__':
run_test() I have once again tried to use the hparams and it works. Please take this code and finally tell me what I need to modify to make it break. If there is an issue with hparams loading, then it should be taken to a new issue. As far as I can tell, the problem of OP was solved, was it not? That is the reason I closed it.
I'm sorry but the frustration is also on my side. I have revisited this thread multiple times, I have helped debug the OPs colab and later I have converted it to the latest PL version and showed that the loaded values are the correct ones. Then this issue transformed into hparams loading, and I am simply not able to guess how to reproduce. Please, if it's not too much to ask
|
@awaelchli I didn't mean to invalidate your frustration, I think it's possible for both sides to be validly frustrated for different reasons. In my case, I tried pointing out it's a problem with the saving rather than the loading, which got buried in the discussion, only to see it closed in reference to an issue that I opened and diagnosed already to not be related to this. I see your point about why a separate issue might help, and created one here: #4333 |
Hi @polars05 @awaelchli , I meet a similar issue as you. When I use the trainer.fit() function to train the model and load the checkpoint file right after the training process to do the evaluation, the test accuracy is 0.8100. However, if I load the checkpoint file again after that and skip the trainer.fit() step, the evaluation accuracy on test dataset is 0.8063. May I ask how you save the issue eventually, I can't find the solution. Thanks! For reference, my main function is here: supervision = SimpleObject({'train_x': train_x, 'val_x': val_x, 'test_x': test_x, 'labels': labels})
datamodule = DataModule(cf, g, features, supervision)
model = LGTLightning(cf=cf, loss_func=th.nn.CrossEntropyLoss())
checkpoint_callback = ModelCheckpoint(monitor='val_acc', mode='max', save_top_k=1, dirpath=cf.temp_path, filename=cf.f_prefix)
logger = TensorBoardLogger(save_dir=cf.log_path, name=cf.f_prefix)
trainer = Trainer(gpus=[cf.gpu] if cf.gpu != -1 else None, max_epochs=cf.epochs,
callbacks=[checkpoint_callback], logger=logger, weights_summary=None)
trainer.fit(model, datamodule=datamodule)
model = LGTLightning.load_from_checkpoint(checkpoint_path=cf.checkpoint_file).to(cf.device)
evaluate(cf, model, datamodule, device=cf.device) And the only difference between the two ways of evaluation I describe above is that I command the trainer.fit(model, datamodule=datamodule). supervision = SimpleObject({'train_x': train_x, 'val_x': val_x, 'test_x': test_x, 'labels': labels})
datamodule = DataModule(cf, g, features, supervision)
model = LGTLightning(cf=cf, loss_func=th.nn.CrossEntropyLoss())
checkpoint_callback = ModelCheckpoint(monitor='val_acc', mode='max', save_top_k=1, dirpath=cf.temp_path, filename=cf.f_prefix)
logger = TensorBoardLogger(save_dir=cf.log_path, name=cf.f_prefix)
trainer = Trainer(gpus=[cf.gpu] if cf.gpu != -1 else None, max_epochs=cf.epochs,
callbacks=[checkpoint_callback], logger=logger, weights_summary=None)
# trainer.fit(model, datamodule=datamodule)
model = LGTLightning.load_from_checkpoint(checkpoint_path=cf.checkpoint_file).to(cf.device)
evaluate(cf, model, datamodule, device=cf.device) |
Hey @HoytWen, did you check As you call fit, it is likely the filepath change a new checkpoint_file is created. Best, |
Hi @tchaton, thanks for your reply and sorry for I did not explain it very clear. supervision = SimpleObject({'train_x': train_x, 'val_x': val_x, 'test_x': test_x, 'labels': labels})
datamodule = DataModule(cf, g, features, supervision)
model = LGTLightning(cf=cf, loss_func=th.nn.CrossEntropyLoss())
checkpoint_callback = ModelCheckpoint(monitor='val_acc', mode='max', save_top_k=1, dirpath=cf.temp_path)
early_stop = EarlyStopping(monitor='val_acc', mode='max', patience=cf.early_stop)
logger = TensorBoardLogger(save_dir=cf.log_path)
trainer = Trainer(gpus=[cf.gpu] if cf.gpu != -1 else None, max_epochs=cf.epochs,
callbacks=[checkpoint_callback, early_stop], logger=logger, weights_summary=None)
trainer.fit(model, datamodule=datamodule)
checkpoint_file = cf.temp_path + os.listdir(cf.temp_path)[0]
model = LGTLightning.load_from_checkpoint(checkpoint_path=checkpoint_file, cf=cf, loss_func=th.nn.CrossEntropyLoss()).to(cf.device)
evaluate(cf, model, datamodule, device=cf.device) As you can see in the |
For your reference, I also attach the datamodule, model and evaluation part here.
class DataModule(LightningDataModule):
def __init__(self, cf, g, features, sup, data_cpu=False, fan_out=[10, 25],
device=th.device('cpu'), batch_size=1000, num_workers=4):
super().__init__()
self.cf = cf
self.device = cf.device
self.g = g.to(self.device)
self.features = features.to(self.device)
self.epochs = cf.epochs
self.n_class = cf.n_class
self.train_x = sup.train_x.to(self.device)
self.val_x = sup.val_x.to(self.device)
self.test_x = sup.test_x.to(self.device)
self.labels = sup.labels.to(self.device)
self.sampled_subg_dataset = EgoSubgraphDataset(self.g, self.cf.fanouts, full_neighbors=False)
self.val_subg_dataset = EgoSubgraphDataset(self.g, self.cf.fanouts, self.cf.full_inference)
def _get_loader(self, dataset, shuffle=True):
return th.utils.data.DataLoader(
dataset=dataset,
batch_size=self.cf.batch_size,
collate_fn=collate_fn(), # * How to form batch
shuffle=shuffle,
num_workers=self.cf.n_workers,
worker_init_fn=worker_init_fn)
def train_dataloader(self):
return self._get_loader(th.utils.data.Subset(self.sampled_subg_dataset, self.train_x))
def val_dataloader(self):
return self._get_loader(th.utils.data.Subset(self.val_subg_dataset, self.val_x))
class LGTLightning(LightningModule):
def __init__(self, cf, loss_func):
# add supverision information
super().__init__()
self.save_hyperparameters('cf', 'loss_func')
self.cf = cf
self.module = EgoGT(self.cf)
self.lr = self.cf.lr
self.n_calss = self.cf.n_class
self.train_acc = Accuracy()
self.val_acc = Accuracy()
self.test_acc = Accuracy()
self.loss_func = loss_func
def training_step(self, batch, batch_idx):
batched_nodes, seed_node_position, batched_graphs = batch
output_labels = batched_graphs.ndata['label'][seed_node_position].to(self.cf.device)
batched_feat = batched_graphs.ndata['F'].to(self.cf.device)
logits = self.module(batched_graphs, batched_feat, seed_node_position)
loss = self.loss_func(logits, output_labels)
self.train_acc(th.softmax(logits, 1), output_labels)
self.log('train_loss', loss, prog_bar=True, on_step=True, on_epoch=True)
self.log('train_acc', self.train_acc, prog_bar=True, on_step=True, on_epoch=True)
train_step_log = {'loss', loss}
batch_dict = {'loss': loss, 'logits': logits, 'labels': output_labels, 'log': train_step_log}
return batch_dict
def training_epoch_end(self, outputs):
avg_train_loss = th.stack([x['loss'] for x in outputs]).mean().item()
logits = th.cat([x['logits'] for x in outputs])
train_y = th.cat([x['labels'] for x in outputs])
train_acc = self.train_acc(th.softmax(logits, 1), train_y).item()
train_log = {'train_epoch_loss': avg_train_loss, 'train_epoch_acc': train_acc}
self.logger.experiment.add_scalars("Train", train_log, self.current_epoch)
def validation_step(self, batch, batch_idx):
batched_nodes, seed_node_position, batched_graphs = batch
output_labels = batched_graphs.ndata['label'][seed_node_position].to(self.cf.device)
batched_feat = batched_graphs.ndata['F'].to(self.cf.device)
# assert th.sum(batched_graphs.ndata['F'][seed_node_position] != self.g.ndata['F'][batched_nodes]) == 0
logits = self.module(batched_graphs, batched_feat, seed_node_position)
return {'logits': logits, 'labels': output_labels}
def validation_epoch_end(self, outputs):
logits = th.cat([x['logits'] for x in outputs])
val_y = th.cat([x['labels'] for x in outputs])
val_acc = self.val_acc(th.softmax(logits, 1), val_y).item()
self.log('val_acc', self.val_acc, prog_bar=True, on_step=False, on_epoch=True)
val_log = {'val_epoch_acc': val_acc}
self.logger.experiment.add_scalars("Validation", val_log, self.current_epoch)
def configure_optimizers(self):
optimizer = th.optim.Adam(self.parameters(), lr=self.lr)
return optimizer
def evaluate(cf, model, datamodule, device=th.device("cpu")):
model.eval()
model.freeze()
labels = datamodule.labels.to(device)
val_x = datamodule.val_x.to(device)
test_x = datamodule.test_x.to(device)
n_class = cf.n_class
val_loader = datamodule.val_dataloader()
test_loader = datamodule.test_dataloader()
def _eval_model(loader, target_x, target_y):
pred = th.ones_like(labels).to(device) * -1
with th.no_grad():
for batch_id, (batched_nodes, seed_node_position, batched_graphs) in enumerate(loader):
batched_feat = batched_graphs.ndata['F']
logits = model.module(batched_graphs, batched_feat, seed_node_position)
pred[batched_nodes] = th.argmax(logits, dim=1).to(device)
acc, val_f1, val_mif1 = eval_classification(pred[target_x], target_y, n_class=n_class)
return acc
val_acc = _eval_model(val_loader, val_x, labels[val_x])
test_acc = _eval_model(test_loader, test_x, labels[test_x])
res = {'test_acc': f'{test_acc:.4f}', 'val_acc': f'{val_acc:.4f}'}
save_results(cf, res) |
Hey @HoytWen, Oh you are using graph right and doing graph classification ? Assuming the weights are the same, the different in metric could be coming from the batch_size not being properly infered. As your graph are batched on the node dimension, the batch size can't be inferred by Lightning which find batch_size == num nodes on the batch. To resolve this, could you try to do Best, |
I had the same problem, I was loading the checkpoint after initializing my model class: |
I'm currently experiencing a similar problem but this solution didn't help ( Using
But when I try to use it in an "offline" test, these are the results:
Every metric value is different and considerably worse than expected. The checkpoint file is the same in both cases. To load the state from a checkpoint file I've tried:
And also:
None of these attempts have worked as the test results are the same. Additionally, I've tried to see what's inside
With the exception of I'm not sure what I'm missing. |
Hello This is correct: model = MyCustomNet.load_from_checkpoint(checkpoint_path = checkpoint_path) Note, if you load the model make sure your hyperparametes match. Either make sure you called self.hyperparameters() in your init, or pass your params to the Furthermore, how did you run trainer.test()?
Is it possible you compared 1) with a best model? |
Hello @awaelchli, thanks for your answer! I'm using option 3 when testing after fitting the model but using To avoid further confusion here is essentially how my training/testing script is currently running: class MyCustomNet(pl.LightningModule):
def __init__(self,
epochs=EPOCHS,
lr=LR,
betas=(BETA_1, BETA_2),
wd=WD,
esp=EARLY_STOP_PATIENCE,
sp=SCHEDULER_PATIENCE,
in_units=IN_UNITS,
out_units=OUT_UNITS):
super().__init__()
self.save_hyperparameters()
# after saving hyperparameters I define my network architecture and the required methods: forward(), configure_optimizers(), training_step(), validation_step() and test_step(), etc.)
# ...
if "__name__" == "__main__":
# ... DataLoader creation for train/val/test datasets.
model = MyCustomNet()
# ... Callback definitions
trainer = pl.Trainer(..., callbacks=[..., checkpoints_callback])
trainer.fit(model, train_loader, val_loader)
# To test immediately after fitting the model
trainer.test(ckpt_path="best", dataloaders=[test_loader])
print(f"Best model checkpoint path: {checkpoints_callback.best_model_path}") When that script finishes, the results are:
Then I created a Jupyter notebook to test the model. The code in that notebook is: test_loader = # here I create the test DataLoader instance using the same dataset file I used in the script.
checkpoint_path = "<copy/paste last print output from script>"
model = MyCustomNet.load_from_checkpoint(checkpoint_path=checkpoint_path)
trainer = pl.Trainer(
gpus=1,
num_nodes=1,
precision=16
)
trainer.test(model, dataloaders=test_loader) When I run the test like this, the results are:
Note: after calling betas: !!python/tuple
- 0.9
- 0.999
epochs: 5000
esp: 50
in_units: 32
lr: 1.0e-05
out_units: 5
sp: 50
wd: 0.01 So the |
Same error here as @Pazitos10, cannot reproduce test results after saving and loading the model. F1 validation score before saving and loading: 0.78. F1 score after saving and loading: 0.38 Relevant parts in model.py class TONet(pl.LightningModule):
def __init__(self,
config):
super().__init__()
self.save_hyperparameters()
# ...
self.f1 = F1Score(ignore_index=0, num_classes=2, average='macro', mdmc_average='samplewise')
self.recall = Recall(ignore_index=0, num_classes=2, average='macro', mdmc_average='samplewise')
self.precision_ = Precision(ignore_index=0, num_classes=2, average='macro', mdmc_average='samplewise') in model = Model(config)
dm = DataModule(config)
trainer = pl.Trainer(
gpus=-1, # for automatically allocatin on all available GPUS
accelerator='gpu',
# to get rid of: Warning: find_unused_parameters=True was specified in
# DDP constructor, but did not find any unused parameters in the forward
# pass. This flag results in an extra traversal
strategy=DDPPlugin(find_unused_parameters=False),
max_epochs=config["num_epochs"],
logger=[tb_logger, aim_logger],
)
trainer.fit(model, dm) in model = Model.load_from_checkpoint(
checkpoint_path=args.path_to_model,
map_location=device
).eval()
trainer = pl.Trainer(
devices=[1],
accelerator=config["accelerator"],
)
dm = DataModule(config)
dm.setup()
trainer.validate(model, datamodule=dm) |
In my case, validation with bz=1 was not working, bz > 1 worked. It has to do with how my torchmetrics are setup self.f1 = F1Score(ignore_index=0, num_classes=2, average='macro', mdmc_average='samplewise')
self.recall = Recall(ignore_index=0, num_classes=2, average='macro', mdmc_average='samplewise')
self.precision_ = Precision(ignore_index=0, num_classes=2, average='macro', mdmc_average='samplewise') the metrics yield different results when the dimensionality of the output data from the model is reduced based on my metrics setup. Looks like it could be @Pazitos10's issue as well @awaelchli ? |
@mikel-brostrom Thank you! Fortunately that was not my problem. As it turns out, I'm just a bit dumb :D. Reading your answer today made me go to check my code again and I realized I made a mistake while creating the offline test dataloader in my notebook. |
Yes, 😄 |
Man this one ended my 5 hour fight with a crazy bug in my code. Thankx !! |
hey, i am also getting same problem when loading weights of saved pytorch model getting very poor accuracy. |
I've trained a system as follows:
And with the above, the test accuracy is 0.7975
However, when I load the checkpoints separately instead:
The accuracy returned is 0.5705
Am I loading the checkpoints wrongly?
The text was updated successfully, but these errors were encountered: