Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong log_dir when use SaveConfigCallback + Wandb Logger #11256

Closed
gau-nernst opened this issue Dec 25, 2021 · 3 comments
Closed

Wrong log_dir when use SaveConfigCallback + Wandb Logger #11256

gau-nernst opened this issue Dec 25, 2021 · 3 comments
Labels
lightningcli pl.cli.LightningCLI logger: wandb Weights & Biases priority: 1 Medium priority task

Comments

@gau-nernst
Copy link
Contributor

gau-nernst commented Dec 25, 2021

🐛 Bug

When I use Lightning CLI with Wandb Logger, SaveConfigCallback saves config file to the root directory instead of the wandb logging directory wandb/run-xxxx-xxxx/files/

|-- bug.py
|-- bug.yaml
|-- wandb/
   |-- run-xxxx-xxxx/
      |-- files/
         |-- ... → expect the config file to be saved here
|-- ... → but the config file is saved here

Without using Wanb Logger (default to TensorBoard), the config is correctly saved under lightning_logs/version_xx/. So it seems like there is a strange interaction between SaveConfigCallback and WandbLogger

To Reproduce

# bug.yaml
import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule
from pytorch_lightning.utilities.cli import LightningCLI


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def val_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


if __name__ == "__main__":
    cli = LightningCLI(BoringModel)
# bug.yaml
trainer:
  max_epochs: 10

  logger:
    - class_path: pytorch_lightning.loggers.WandbLogger
      init_args:
        project: bug
        log_model: true

Run from command line

python bug.py fit --config bug.yaml

Expected behavior

The config file should be saved under the wandb logging directory wandb/run-xxxx-xxxx/files/

Environment

  • PyTorch Lightning Version (e.g., 1.5.0): 1.5.5
  • PyTorch Version (e.g., 1.10): 1.10.0
  • Python version (e.g., 3.9): 3.8
  • OS (e.g., Linux): Ubuntu 18.04
  • CUDA/cuDNN version: 11.3
  • GPU models and configuration: 4x RTX 3090 (GPUs are not involved, so not relevant)
  • How you installed PyTorch (conda, pip, source): conda
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information: wandb 0.12.9

Additional context

I saw this issue #7679. It seems like the config was saved correctly to wandb logging directory before (thus the name clash), but now it is not saved there anymore.

cc @tchaton @rohitgr7 @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire @carmocca @mauvilsa

@gau-nernst gau-nernst added the bug Something isn't working label Dec 25, 2021
@akihironitta akihironitta added the lightningcli pl.cli.LightningCLI label Dec 26, 2021
@carmocca carmocca added the logger: wandb Weights & Biases label Dec 26, 2021
@morganmcg1
Copy link

@borisdayma any idea on this?

@borisdayma
Copy link
Contributor

I'm not sure it should be save under that directory and should maybe be handled separately, like we do with the models. This way there is a standard location where files are saved which is independent from wandb.
Also it is hard to make it reliable if you use multiple loggers.

We could either log it as an artifact or a file when wandb callback is active. Also not sure if the config should be an input or output artifact (use_artifact vs log_artifact).

Then we just have to think how to handle it, within WandbCallback or within SaveConfigCallback which would then make a call to the logger (a bit like we do with ModelCheckpoint).

@mauvilsa
Copy link
Contributor

Currently SaveConfigCallback saves the config to the trainer.log_dir directory. There is no logger specific logic. From the description it seems that trainer.log_dir points to the root folder. I would expect the config to be saved in lightning_logs/version_xx/, not in the root and not in wandb/run-xxxx-xxxx/files/. There was a pull request to standardize the log_dir for all loggers (#7543). Not sure why it was closed but could be related to this.

Saving the config in a logger specific way should be a feature request. The bug here would be that the config is saved in the root instead of lightning_logs/version_xx/.

Only SaveConfigCallback is able to save the config because it is instantiated by the CLI with the required data. By the way, LightningCLI is designed so that a custom SaveConfigCallback class can be used. Just subclass it and save the config in any directory and/or save it as an artifact.

@tchaton tchaton added priority: 1 Medium priority task and removed bug Something isn't working labels Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lightningcli pl.cli.LightningCLI logger: wandb Weights & Biases priority: 1 Medium priority task
Projects
None yet
Development

No branches or pull requests

7 participants