Expected all tensors to be on the same device #340

jaffe-fly · 2021-07-01T09:07:56Z

🐛 Bug

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

from torchmetrics.classification import F1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = F1(num_classes=self.args.num_classes)(preds, y)
    self.log('val_loss', loss, prog_bar=True)
    self.log('val_acc', acc, prog_bar=True)
    self.log('val_f1', f1, prog_bar=True)
    return loss

model = ImageNetClassify(args)
    earlystop = EarlyStopping("val_acc",patience=5)
    checkpoint_callback = ModelCheckpoint(
        monitor='val_acc',  # 需要监控的指标
        dirpath='./ntt/alphamind', #权重保存路径
        mode = 'max',# 监控变量的最小值
        verbose = True,
    )
    trainer = Trainer(logger=tb_logger,
                      # weights_summary='full',
                      progress_bar_refresh_rate=1,
                      gpus=1,
                      auto_select_gpus=True,
                      log_gpu_memory='all',
                      benchmark=True,
                      # max_epochs=30,
                      num_sanity_val_steps=2,
                      auto_scale_batch_size=True,
                      auto_lr_find=True,
                      callbacks=[earlystop,checkpoint_callback])

To Reproduce

Steps to reproduce the behavior:

Run 'trainer.fit(model)'
See error

Expected behavior

Environment

PyTorch Version ( 1.8):
OS (Linux):
How you installed PyTorch (pip):
Python version: 3.7

The text was updated successfully, but these errors were encountered:

SkafteNicki · 2021-07-01T09:13:25Z

Hi @jaffe-fly,
Modular metrics are nn.Module and therefore needs to be moved to the same device as the input. This is done automatically if you define them in the __init__ method of your model. You can read more here: https://torchmetrics.readthedocs.io/en/latest/pages/overview.html#metrics-and-devices
However, an maybe easier fix in your case is you could just use the functional version of the F1 metric:

from torchmetrics.functional import f1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = f1(preds, y, num_classes=self.args.num_classes)
    self.log('val_loss', loss, prog_bar=True)
    self.log('val_acc', acc, prog_bar=True)
    self.log('val_f1', f1, prog_bar=True)
    return loss

jaffe-fly added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 1, 2021

SkafteNicki added working as intended and removed bug / fix Something isn't working labels Jul 1, 2021

Borda changed the title ~~Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!~~ Expected all tensors to be on the same device Jul 1, 2021

Borda closed this as completed Jul 1, 2021

Lightning-AI locked and limited conversation to collaborators Jul 1, 2021

Borda added this to the v0.5 milestone Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Expected all tensors to be on the same device #340

Expected all tensors to be on the same device #340

jaffe-fly commented Jul 1, 2021 •

edited by Borda

Loading

SkafteNicki commented Jul 1, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Expected all tensors to be on the same device #340

Expected all tensors to be on the same device #340

Comments

jaffe-fly commented Jul 1, 2021 • edited by Borda Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

SkafteNicki commented Jul 1, 2021

This issue was moved to a discussion.

jaffe-fly commented Jul 1, 2021 •

edited by Borda

Loading