TorchMetrics, Pytorch Lightning and DataParallel #528

aretor · 2021-09-16T08:54:22Z

aretor
Sep 16, 2021

In the TorchMetrics in Pytorch Lightning section there is the following warning:

If using metrics in data parallel mode (dp), the metric update/logging should be done in the _step_end method (where is either training, validation or test). This is due to metric states else being destroyed after each forward pass, leading to wrong accumulation.

Is this true also in the case of distributed data parallel mode (ddp/ddp2)?

My code reports correct metric values only if I follow the instructions above.

Answered by SkafteNicki

Sep 21, 2021

No, in ddp it should not be necessary. You still need to sync the metric at some point between the different devices, which is automatically done when metric.compute() is called. Therefore, something like this should still work:

def training_step(self, batch, batch_idx):
    ...
    self.metric.update(preds, target)
    ...

def training_epoch_end(self, outputs)
    val = self.metric.compute()  # this will sync the metric between devices
    self.log("metric", val)
    self.metric.reset()

View full answer

SkafteNicki · 2021-09-21T11:14:16Z

SkafteNicki
Sep 21, 2021
Maintainer

No, in ddp it should not be necessary. You still need to sync the metric at some point between the different devices, which is automatically done when metric.compute() is called. Therefore, something like this should still work:

def training_step(self, batch, batch_idx):
    ...
    self.metric.update(preds, target)
    ...

def training_epoch_end(self, outputs)
    val = self.metric.compute()  # this will sync the metric between devices
    self.log("metric", val)
    self.metric.reset()

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchMetrics, Pytorch Lightning and DataParallel #528

{{title}}

Replies: 1 comment

{{title}}

Select a reply

TorchMetrics, Pytorch Lightning and DataParallel #528

aretor Sep 16, 2021

Replies: 1 comment

SkafteNicki Sep 21, 2021 Maintainer

aretor
Sep 16, 2021

SkafteNicki
Sep 21, 2021
Maintainer