About multi-GPUs training #2

SimonCK666 · 2022-06-20T05:51:05Z

Could u update code for multi-GPUs training?

I've tried change it for multi-GPUs training, but got some bugs
I've changed train.py as follows:

if torch.cuda.device_count() > 1:
        print("Let's use", torch.cuda.device_count(), "GPUs!")
        # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
        model = nn.DataParallel(model)

    model.to(config.device)
    model.train()

But when I trained this model, I've got this bug:

Traceback (most recent call last):
  File "train.py", line 118, in <module>
    train_model(config)
  File "train.py", line 76, in train_model
    loss_val, psnr = loss_func(comp_rgb, pixels, rays.lossmult.to(config.device))
  File "/home/hyang/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hyang/Desktop/mipnerf-pytorch/loss.py", line 13, in forward
    mse = (mask * ((rgb - target[..., :3]) ** 2)).sum() / mask.sum()
RuntimeError: The size of tensor a (1024) must match the size of tensor b (2048) at non-singleton dimension 0

The text was updated successfully, but these errors were encountered:

SimonCK666 · 2022-06-20T05:52:30Z

I found that the dim of pixels is wrong.
So I changed: pixels.cuda()
But nothing changed, still this bug exists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About multi-GPUs training #2

About multi-GPUs training #2

SimonCK666 commented Jun 20, 2022

SimonCK666 commented Jun 20, 2022

About multi-GPUs training #2

About multi-GPUs training #2

Comments

SimonCK666 commented Jun 20, 2022

SimonCK666 commented Jun 20, 2022