Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected all tensors to be on the same device #340

Closed
jaffe-fly opened this issue Jul 1, 2021 · 1 comment
Closed

Expected all tensors to be on the same device #340

jaffe-fly opened this issue Jul 1, 2021 · 1 comment
Labels
help wanted Extra attention is needed working as intended
Milestone

Comments

@jaffe-fly
Copy link

jaffe-fly commented Jul 1, 2021

🐛 Bug

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

from torchmetrics.classification import F1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = F1(num_classes=self.args.num_classes)(preds, y)
    self.log('val_loss', loss, prog_bar=True)
    self.log('val_acc', acc, prog_bar=True)
    self.log('val_f1', f1, prog_bar=True)
    return loss

model = ImageNetClassify(args)
    earlystop = EarlyStopping("val_acc",patience=5)
    checkpoint_callback = ModelCheckpoint(
        monitor='val_acc',  # 需要监控的指标
        dirpath='./ntt/alphamind', #权重保存路径
        mode = 'max',# 监控变量的最小值
        verbose = True,
    )
    trainer = Trainer(logger=tb_logger,
                      # weights_summary='full',
                      progress_bar_refresh_rate=1,
                      gpus=1,
                      auto_select_gpus=True,
                      log_gpu_memory='all',
                      benchmark=True,
                      # max_epochs=30,
                      num_sanity_val_steps=2,
                      auto_scale_batch_size=True,
                      auto_lr_find=True,
                      callbacks=[earlystop,checkpoint_callback])

To Reproduce

Steps to reproduce the behavior:

  1. Run 'trainer.fit(model)'
  2. See error

image

Expected behavior

Environment

  • PyTorch Version ( 1.8):
  • OS (Linux):
  • How you installed PyTorch (pip):
  • Python version: 3.7
@jaffe-fly jaffe-fly added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 1, 2021
@SkafteNicki
Copy link
Member

Hi @jaffe-fly,
Modular metrics are nn.Module and therefore needs to be moved to the same device as the input. This is done automatically if you define them in the __init__ method of your model. You can read more here: https://torchmetrics.readthedocs.io/en/latest/pages/overview.html#metrics-and-devices
However, an maybe easier fix in your case is you could just use the functional version of the F1 metric:

from torchmetrics.functional import f1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = f1(preds, y, num_classes=self.args.num_classes)
    self.log('val_loss', loss, prog_bar=True)
    self.log('val_acc', acc, prog_bar=True)
    self.log('val_f1', f1, prog_bar=True)
    return loss

@SkafteNicki SkafteNicki added working as intended and removed bug / fix Something isn't working labels Jul 1, 2021
@Borda Borda changed the title Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Expected all tensors to be on the same device Jul 1, 2021
@Borda Borda closed this as completed Jul 1, 2021
@Lightning-AI Lightning-AI locked and limited conversation to collaborators Jul 1, 2021
@Borda Borda added this to the v0.5 milestone Aug 18, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
help wanted Extra attention is needed working as intended
Projects
None yet
Development

No branches or pull requests

3 participants