Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning on argparse gpus flag #7236

Closed
wants to merge 1 commit into from
Closed

Conversation

edenlightning
Copy link
Contributor

fixes #6228.

@edenlightning edenlightning changed the title Add waring on argparse gpus flag Add warning on argparse gpus flag Apr 27, 2021
@codecov
Copy link

codecov bot commented Apr 27, 2021

Codecov Report

Merging #7236 (efcd2d0) into master (f920ba2) will decrease coverage by 0%.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #7236   +/-   ##
======================================
- Coverage      91%     91%   -0%     
======================================
  Files         198     198           
  Lines       12723   12723           
======================================
- Hits        11632   11629    -3     
- Misses       1091    1094    +3     

@awaelchli
Copy link
Contributor

awaelchli commented Apr 27, 2021

This does not address #6228. This argparser here in the example doesn't have a type, so the string is forwarded to the trainer as if you would set Trainer(gpus="3"). This selects the third GPU, not [1, 2, 3] and it is different form our Lightning argparser. See this example:

import os
from argparse import ArgumentParser

import torch
from torch.utils.data import Dataset, DataLoader
from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    test_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    parser = ArgumentParser()
    parser.add_argument("--gpus", default=None)
    args = parser.parse_args()

    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        num_sanity_val_steps=0,
        gpus=args.gpus,
        weights_summary=None,
    )
    trainer.fit(model, train_dataloader=train_data, val_dataloaders=val_data)
    trainer.test(model, test_dataloaders=test_data)


if __name__ == '__main__':
    run()

As I explained in the issue and on the linked PR, the issue is not with any of the arparser logic we have. The origin of the confusion is the argument in the Trainer itself, when passing in a string: Trainer(gpus="3").
The confusion comes from the fact that Trainer(gpus="3") does not mean the same as Trainer(gpus=3)

@tchaton
Copy link
Contributor

tchaton commented Apr 28, 2021

Hey @awaelchli,

Should we change this behaviour and convert "3" to map to [0, 1, 2] instead.
It will be a breaking change but we can add a warning that this behaviour will be enforced in 1.5 ?

Your thoughts ?

@awaelchli
Copy link
Contributor

awaelchli commented Apr 28, 2021

To fix the confusion yes, it would be the best option IMO. And it's what I was doing here in this PR #6388 already to resolve the issue, but got push back because it's a backward incompatible change and so we are not allowed to change it unfortunately.

@tchaton
Copy link
Contributor

tchaton commented May 4, 2021

Hey @awaelchli, closing this PR then :)

@tchaton tchaton closed this May 4, 2021
@Borda Borda deleted the edenlightning-patch-1 branch May 10, 2021 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli: Confused on (str, int, List[int]) variants for argparse for --gpus flag?
4 participants