Improve GMARE metric computation formula #2087

KickItLikeShika · 2021-06-28T15:35:47Z

Months ago I tried to improve GMARE to be compatible with DDP, but I had a very strange issue, and the distributed tests were failing, I thought I did something wrong (even I updated +10 metrics for DDP then) so I ignored it at that time, today I was trying to update it and send the PR but I had the same issue, the tests distributed tests are failing.

distributed_context_single_node_gloo = {'local_rank': 1, 'rank': 1, 'world_size': 2}

    @pytest.mark.distributed
    @pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
    def test_distrib_gloo_cpu_or_gpu(distributed_context_single_node_gloo):
    
        device = idist.device()
>       _test_distrib_compute(device)


metric_device = device(type='cpu')

    def _test(metric_device):
        metric_device = torch.device(metric_device)
        m = GeometricMeanRelativeAbsoluteError(device=metric_device)
        torch.manual_seed(10 + rank)
    
        y_pred = torch.rand(size=(100,), device=device)
        y = torch.rand(size=(100,), device=device)
    
        m.update((y_pred, y))
    
        y_pred = idist.all_gather(y_pred)
        y = idist.all_gather(y)
    
        np_y = y.cpu().numpy()
        np_y_pred = y_pred.cpu().numpy()
    
        np_gmrae = np.exp(np.log(np.abs(np_y - np_y_pred) / np.abs(np_y - np_y.mean())).mean())
    
        # sum_y = 0
        # num_examples = 0
        # sum_of_errors = 0
        # np_gmrae = 0
        # sum_y += np_y.sum()
        # num_examples += np_y.shape[0]
        # y_mean = sum_y / num_examples
        # numerator = np.abs(y.view_as(y_pred) - y_pred)
        # denominator = np.abs(y.view_as(y_pred) - y_mean)
        # sum_of_errors += np.log(numerator / denominator).sum()
        # np_gmrae += np.exp((sum_of_errors / num_examples).mean())
    
>       assert m.compute() == pytest.approx(np_gmrae)
E       assert 1.3195266723632812 == 1.3320785760879517 ± 1.3e-06
E         +1.3195266723632812
E         -1.3320785760879517 ± 1.3e-06


FAILED tests/ignite/contrib/metrics/regression/test_geometric_mean_relative_absolute_error.py::test_distrib_gloo_cpu_or_gpu - assert 1.3195266723632812 == 1.3320785760879517 ± 1.3e-06
FAILED tests/ignite/contrib/metrics/regression/test_geometric_mean_relative_absolute_error.py::test_distrib_gloo_cpu_or_gpu - assert 1.3195266723632812 == 1.3320785760879517 ± 1.3e-06

And here is the code and the tests
master...KickItLikeShika:improve-GMRAE-ddp

The text was updated successfully, but these errors were encountered:

sdesrozis · 2021-06-28T20:16:52Z

This metric is similar to the Geometric Mean Absolute Error which works fine in ddp. I think the error is in the computation of the relative term.

Line 147 of the test, np_y.mean() is computed on the complete y vector gathered over procs. However, Line 48 of the metric, the mean is computed on y separately on each proc.

@KickItLikeShika What do you think ?

KickItLikeShika · 2021-06-29T08:03:25Z

Interesting! thanks for the explanation @sdesrozis , but I'm wondering here, isn't that mismatch in the results confusing for the user? I don't exactly which way is the right one for computing this metric (computing the mean on each proc separately -like the metric itself-, or the way in the tests). So I don't know should I edit the way we compute the metric, or this test?

sdesrozis · 2021-06-29T14:54:43Z

The answer is in the GMARE definition 😊

The actual implementation sounds weird to me. The results should be different whether the number of procs used. I didn't look in details and I think you should do to fix this issue. I can help if needed.

IMO the good implementation is the test one.

KickItLikeShika · 2021-06-29T18:12:05Z

Thanks @sdesrozis for clarifications!
OK, I will change the way we compute this metric, does this sound OK for you? @sdesrozis @vfdev-5

sdesrozis · 2021-06-29T20:07:03Z

From the reference paper p14 :

Normalization by variability of actuals: N3 = (𝐴 − 𝐴̅)^−𝑐
Normalization by variability of actuals includes division of the error by the difference between the actual value and mean value of all actuals. For the magnitude of error and absolute error c = 1, and for the squared error c = 2. Also, for the absolute distance error, absolute actuals are used.
Inclusion of the actuals mean 𝐴̅ is intended to lower the risk of division-by-zero situations. Actuals mean is implemented in R packages (e.g. in MLmetrics, metrics, rminer). In general case, normalization can use an error from a benchmark method (usually naïve forecasting) (Hyndman, 2006).

So the key point is to understand well what means the mean of all actuals. It sounds to me that a mean over all y as in test is correct.

On the other hand, it seems that GMRAE would be a metric for forecasting time series. Therefore, 𝐴̅ stands for A at the previous time... See https://www.rdocumentation.org/packages/rminer/versions/1.4.6/topics/mmetric or https://support.numxl.com/hc/en-us/articles/115001223403-GMRAE-Geometric-Mean-Relative-Absolute-Error

I don't know whether this metric is relevant...

HTH

KickItLikeShika · 2021-06-30T13:22:39Z

Thanks for @sdesrozis for providing help!
After more digging I understood what's going on here, I tried to update this metric to be like the other, so I implemented something like this

    @reinit__is_reduced
    def reset(self) -> None:
        self._num_examples = 0
        self._sum_of_errors = torch.tensor(0.0, device=self._device)

    def _update(self, output: Tuple[torch.Tensor, torch.Tensor]) -> None:
        y_pred, y = output[0].detach(), output[1].detach()
        errors = torch.log(torch.abs(y - y_pred) / torch.abs(y - y.mean()))
        self._sum_of_errors += torch.sum(errors).to(self._device)
        self._num_examples += y.shape[0]
    
    @sync_all_reduce("_sum_of_errors", "_num_examples")
    def compute(self) -> float:
        if self._num_examples == 0:
            raise NotComputableError(
                "GeometricMeanRelativeAbsoluteError must have at least one example before it can be computed."
            )
        return torch.exp(self._sum_of_errors / self._num_examples).item()

It worked well but not for the integration tests as we define batches (and that was the reason that the integration tests were different from other metrics), so we calculate the mean over a single batch, and that's the cause of the misleading results in case of (integration and distributed).
So I think I have an idea to update this metric and update it for DDP too, have a look here: https://gist.github.com/bshishov/5dc237f59f019b26145648e2124ca1c9#file-forecasting_metrics-py-L240
GMARE is used for forecasting, so in the previous gist the seasonality/benchmark option is provided, I'm thinking about providing this option and use it instead of the mean. Please let me know what do you think about that, thanks!

sdesrozis · 2021-06-30T13:46:16Z

@KickItLikeShika Thanks for exploring the usage of this metric.

We don't have any time series use case. Why not provide such an example to illustrate the GMRAE usage, and others regression metrics as well ? Although I don't have any experience on this topic.

KickItLikeShika · 2021-07-01T07:50:49Z

I have been looking for examples or something shows the usage of the metric, but I found nothing solid yet, but this link you sent above somehow clarifies the usage of the metric https://support.numxl.com/hc/en-us/articles/115001223403-GMRAE-Geometric-Mean-Relative-Absolute-Error

sdesrozis · 2021-07-01T18:20:03Z

Yep I cited this article in my previous post. Nothing solid, I agree.

KickItLikeShika added the question label Jun 28, 2021

KickItLikeShika changed the title ~~GMARE compatibility with DDP~~ Improve GMARE metric computation formula Jun 30, 2021

KickItLikeShika mentioned this issue Jul 3, 2021

Fixed GMRAE implementation + Improved for DDP #2096

Merged

3 tasks

vfdev-5 closed this as completed in #2096 Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GMARE metric computation formula #2087

Improve GMARE metric computation formula #2087

KickItLikeShika commented Jun 28, 2021

sdesrozis commented Jun 28, 2021

KickItLikeShika commented Jun 29, 2021

sdesrozis commented Jun 29, 2021 •

edited

Loading

KickItLikeShika commented Jun 29, 2021 •

edited

Loading

sdesrozis commented Jun 29, 2021 •

edited

Loading

KickItLikeShika commented Jun 30, 2021

sdesrozis commented Jun 30, 2021

KickItLikeShika commented Jul 1, 2021

sdesrozis commented Jul 1, 2021

Improve GMARE metric computation formula #2087

Improve GMARE metric computation formula #2087

Comments

KickItLikeShika commented Jun 28, 2021

sdesrozis commented Jun 28, 2021

KickItLikeShika commented Jun 29, 2021

sdesrozis commented Jun 29, 2021 • edited Loading

KickItLikeShika commented Jun 29, 2021 • edited Loading

sdesrozis commented Jun 29, 2021 • edited Loading

KickItLikeShika commented Jun 30, 2021

sdesrozis commented Jun 30, 2021

KickItLikeShika commented Jul 1, 2021

sdesrozis commented Jul 1, 2021

sdesrozis commented Jun 29, 2021 •

edited

Loading

KickItLikeShika commented Jun 29, 2021 •

edited

Loading

sdesrozis commented Jun 29, 2021 •

edited

Loading