Expected all tensors to be on the same device #341

jaffe-fly · 2021-07-01T09:07:56Z

jaffe-fly
Jul 1, 2021

Question

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

from torchmetrics.classification import F1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = F1(num_classes=self.args.num_classes)(preds, y)
    self.log('val_loss', loss, prog_bar=True)
    self.log('val_acc', acc, prog_bar=True)
    self.log('val_f1', f1, prog_bar=True)
    return loss

model = ImageNetClassify(args)
    earlystop = EarlyStopping("val_acc",patience=5)
    checkpoint_callback = ModelCheckpoint(
        monitor='val_acc',  # 需要监控的指标
        dirpath='./ntt/alphamind', #权重保存路径
        mode = 'max',# 监控变量的最小值
        verbose = True,
    )
    trainer = Trainer(logger=tb_logger,
                      # weights_summary='full',
                      progress_bar_refresh_rate=1,
                      gpus=1,
                      auto_select_gpus=True,
                      log_gpu_memory='all',
                      benchmark=True,
                      # max_epochs=30,
                      num_sanity_val_steps=2,
                      auto_scale_batch_size=True,
                      auto_lr_find=True,
                      callbacks=[earlystop,checkpoint_callback])

To Reproduce

Steps to reproduce the behavior:

Run 'trainer.fit(model)'
See error

Expected behavior

Environment

PyTorch Version ( 1.8):
OS (Linux):
How you installed PyTorch (pip):
Python version: 3.7

Answered by SkafteNicki

Jul 1, 2021

Hi @jaffe-fly,
Modular metrics are nn.Module and therefore needs to be moved to the same device as the input. This is done automatically if you define them in the __init__ method of your model. You can read more here: https://torchmetrics.readthedocs.io/en/latest/pages/overview.html#metrics-and-devices
However, an maybe easier fix in your case is you could just use the functional version of the F1 metric:

from torchmetrics.functional import f1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = f1(preds, y, num_classes=self.args.num_classes…

View full answer

SkafteNicki · 2021-07-01T09:13:25Z

SkafteNicki
Jul 1, 2021
Maintainer

Hi @jaffe-fly,
Modular metrics are nn.Module and therefore needs to be moved to the same device as the input. This is done automatically if you define them in the __init__ method of your model. You can read more here: https://torchmetrics.readthedocs.io/en/latest/pages/overview.html#metrics-and-devices
However, an maybe easier fix in your case is you could just use the functional version of the F1 metric:

from torchmetrics.functional import f1

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self.forward(x)
    loss = self.loss_fn(logits, y)
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    f1 = f1(preds, y, num_classes=self.args.num_classes)
    self.log('val_loss', loss, prog_bar=True)
    self.log('val_acc', acc, prog_bar=True)
    self.log('val_f1', f1, prog_bar=True)
    return loss

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected all tensors to be on the same device #341

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Expected all tensors to be on the same device #341

jaffe-fly Jul 1, 2021

Question

To Reproduce

Expected behavior

Environment

Replies: 1 comment

SkafteNicki Jul 1, 2021 Maintainer

jaffe-fly
Jul 1, 2021

SkafteNicki
Jul 1, 2021
Maintainer