Skip to content

Commit

Permalink
Merge branch 'master' into aggregation
Browse files Browse the repository at this point in the history
  • Loading branch information
SkafteNicki authored Sep 24, 2021
2 parents 7606768 + f9d7d5f commit 04e44f9
Show file tree
Hide file tree
Showing 38 changed files with 147 additions and 116 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci_test-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ jobs:
run: |
sudo apt install libsndfile1
conda info
conda install cpuonly mkl pytorch=${{ matrix.pytorch-version }}
conda install cpuonly mkl pytorch=${{ matrix.pytorch-version }} packaging
conda install cpuonly $(python ./requirements/adjust-versions.py conda)
conda install ffmpeg
conda list
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Added Learned Perceptual Image Patch Similarity (LPIPS) ([#431](https://github.com/PyTorchLightning/metrics/issues/431))


- Added Tweedie Deviance Score ([#499](https://github.com/PyTorchLightning/metrics/pull/499))


Expand Down Expand Up @@ -47,6 +48,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fixed bug in `F1` with `average='macro'` and `ignore_index!=None` ([#495](https://github.com/PyTorchLightning/metrics/pull/495))


- Fixed bug in `pit` by using the returned first result to initialize device and type ([#533](https://github.com/PyTorchLightning/metrics/pull/533))


- Fixed `SSIM` metric using too much memory ([#539](https://github.com/PyTorchLightning/metrics/pull/539))


## [0.5.1] - 2021-08-30

### Added
Expand Down
34 changes: 32 additions & 2 deletions docs/source/links.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
.. _confusion matrix: https://en.wikipedia.org/wiki/Confusion_matrix#Table_of_confusion
.. _sklearn averaging methods: https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification
.. _Cosine Similarity: https://en.wikipedia.org/wiki/Cosine_similarity
.. _coefficient of determination: https://en.wikipedia.org/wiki/Coefficient_of_determination
.. _spearmans rank correlation coefficient: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
.. _WER: https://en.wikipedia.org/wiki/Word_error_rate
.. _FID: https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance
Expand All @@ -37,6 +36,37 @@
.. _IR Fall-out: https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Fall-out
.. _MAPE implementation returns: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_percentage_error.html
.. _mean squared logarithmic error: https://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-log-error
.. _Mean Reciprocal Rank: https://en.wikipedia.org/wiki/Mean_reciprocal_rank
.. _LPIPS: https://arxiv.org/abs/1801.03924
.. _Tweedie Deviance Score: https://en.wikipedia.org/wiki/Tweedie_distribution#The_Tweedie_deviance
.. _Permutation Invariant Training of Deep Models: https://ieeexplore.ieee.org/document/7952154
.. _Computes the Top-label Calibration Error: https://arxiv.org/pdf/1909.10155.pdf
.. _Gradient Computation of Image: https://en.wikipedia.org/wiki/Image_gradient
.. _R2 Score_Coefficient Determination: https://en.wikipedia.org/wiki/Coefficient_of_determination
.. _Rank of element tensor: https://github.com/scipy/scipy/blob/v1.6.2/scipy/stats/stats.py#L4140-L4303
.. _Mean Reciprocal Rank: https://en.wikipedia.org/wiki/Mean_reciprocal_rank
.. _BERT_score: https://github.com/Tiiiger/bert_score/blob/master/bert_score/utils.py
.. _Bert_score Evaluating Text Generation: https://arxiv.org/abs/1904.09675
.. _BLEU score: https://en.wikipedia.org/wiki/BLEU
.. _BLEU: http://www.aclweb.org/anthology/P02-1040.pdf
.. _Machine Translation Evolution: https://aclanthology.org/P04-1077.pdf
.. _Rouge score_Text Normalizition: https://github.com/google-research/google-research/blob/master/rouge/tokenize.py
.. _Calculate Rouge Score: https://en.wikipedia.org/wiki/ROUGE_(metric)
.. _Rouge Detail: https://aclanthology.org/W04-1013/
.. _Square Root of a Positive Definite Matrix: https://github.com/steveli/pytorch-sqrtm/blob/master/sqrtm.py
.. _Fid Score: https://github.com/photosynthesis-team/piq/blob/master/piq/fid.py
.. _Rethinking the Inception Architecture for ComputerVision: https://arxiv.org/abs/1512.00567
.. _GANs Trained by a Two Time-Scale: https://arxiv.org/abs/1706.08500
.. _Improved Techniques for Training GANs: https://arxiv.org/abs/1606.03498
.. _KID Score: https://github.com/toshas/torch-fidelity/blob/v0.3.0/torch_fidelity/metric_kid.py
.. _Demystifying MMD GANs: https://arxiv.org/abs/1801.01401
.. _Computes Peak Signal-to-Noise Ratio: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
.. _Turn a Metric into a Bootstrapped: https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
.. _Metric Test for Reset: https://github.com/PyTorchLightning/pytorch-lightning/pull/7055
.. _Computes Mean Absolute Error: https://en.wikipedia.org/wiki/Mean_absolute_error
.. _Mean Absolute Percentage Error: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
.. _mean squared error: https://en.wikipedia.org/wiki/Mean_squared_error
.. _Aggregate the statistics from multiple devices: https://stackoverflow.com/questions/68395368/estimate-running-correlation-on-multiple-nodes
.. _Pearson Correlation Coefficient: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
.. _Python ROUGE Implementation: https://pypi.org/project/rouge-score/
.. _Scikit_Learn-Ranking.py: https: //github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/_ranking.py
.. _Verified Uncertainty Calibration: https://arxiv.org/abs/1909.10155
3 changes: 1 addition & 2 deletions integrations/test_lightning.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,7 @@ def training_epoch_end(self, outs):
def test_metrics_reset(tmpdir):
"""Tests that metrics are reset correctly after the end of the train/val/test epoch.
Taken from:
https://github.com/PyTorchLightning/pytorch-lightning/pull/7055
Taken from: `Metric Test for Reset`_
"""

class TestModel(LightningModule):
Expand Down
5 changes: 4 additions & 1 deletion requirements/adjust-versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@
import sys
from typing import Dict, Optional

from packaging.version import Version

VERSIONS = [
dict(torch="1.10.0", torchvision="0.11.0", torchtext=""), # nightly
dict(torch="1.9.1", torchvision="0.10.1", torchtext="0.10.1"),
dict(torch="1.9.0", torchvision="0.10.0", torchtext="0.10.0"),
dict(torch="1.8.2", torchvision="0.9.1", torchtext="0.9.1"),
dict(torch="1.8.1", torchvision="0.9.1", torchtext="0.9.1"),
dict(torch="1.8.0", torchvision="0.9.0", torchtext="0.9.0"),
dict(torch="1.7.1", torchvision="0.8.2", torchtext="0.8.1"),
Expand All @@ -19,7 +22,7 @@
dict(torch="1.3.1", torchvision="0.4.2", torchtext="0.4"),
dict(torch="1.3.0", torchvision="0.4.1", torchtext="0.4"),
]
VERSIONS.sort(key=lambda v: v["torch"], reverse=True)
VERSIONS.sort(key=lambda v: Version(v["torch"]), reverse=True)


def find_latest(ver: str) -> Dict[str, str]:
Expand Down
3 changes: 1 addition & 2 deletions torchmetrics/classification/calibration_error.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@
class CalibrationError(Metric):
r"""
Computes the top-label calibration error as described in `this paper <https://arxiv.org/pdf/1909.10155.pdf>`_.
`Computes the Top-label Calibration Error`_
Three different norms are implemented, each corresponding to variations on the calibration error metric.
L1 norm (Expected Calibration Error)
Expand Down
19 changes: 12 additions & 7 deletions torchmetrics/functional/audio/pit.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,7 @@ def pit(
[-0.1719, 0.3205, 0.2951]]])
Reference:
[1] D. Yu, M. Kolbaek, Z.-H. Tan, J. Jensen, Permutation invariant training of deep models for
speaker-independent multi-talker speech separation, in: 2017 IEEE Int. Conf. Acoust. Speech
Signal Process. ICASSP, IEEE, New Orleans, LA, 2017: pp. 241–245. https://doi.org/10.1109/ICASSP.2017.7952154.
[1] `Permutation Invariant Training of Deep Models`_
"""
_check_same_shape(preds, target)
if eval_func not in ["max", "min"]:
Expand All @@ -153,10 +151,17 @@ def pit(

# calculate the metric matrix
batch_size, spk_num = target.shape[0:2]
metric_mtx = torch.empty((batch_size, spk_num, spk_num), dtype=preds.dtype, device=target.device)
for t in range(spk_num):
for e in range(spk_num):
metric_mtx[:, t, e] = metric_func(preds[:, e, ...], target[:, t, ...], **kwargs)
metric_mtx = None
for target_idx in range(spk_num): # we have spk_num speeches in target in each sample
for preds_idx in range(spk_num): # we have spk_num speeches in preds in each sample
if metric_mtx is not None:
metric_mtx[:, target_idx, preds_idx] = metric_func(
preds[:, preds_idx, ...], target[:, target_idx, ...], **kwargs
)
else:
first_ele = metric_func(preds[:, preds_idx, ...], target[:, target_idx, ...], **kwargs)
metric_mtx = torch.empty((batch_size, spk_num, spk_num), dtype=first_ele.dtype, device=first_ele.device)
metric_mtx[:, target_idx, preds_idx] = first_ele

# find best
op = torch.max if eval_func == "max" else torch.min
Expand Down
2 changes: 1 addition & 1 deletion torchmetrics/functional/classification/accuracy.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ def accuracy(
multiclass: Optional[bool] = None,
ignore_index: Optional[int] = None,
) -> Tensor:
r"""Computes `Accuracy <https://en.wikipedia.org/wiki/Accuracy_and_precision>`_:
r"""Computes `Accuracy`_
.. math::
\text{Accuracy} = \frac{1}{N}\sum_i^N 1(y_i = \hat{y}_i)
Expand Down
4 changes: 2 additions & 2 deletions torchmetrics/functional/classification/calibration_error.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def _ce_compute(
bin_boundaries (FloatTensor): Bin boundaries separating the linspace from 0 to 1.
norm (str, optional): Norm function to use when computing calibration error. Defaults to "l1".
debias (bool, optional): Apply debiasing to L2 norm computation as in
Verified Uncertainty Calibration (https://arxiv.org/abs/1909.10155). Defaults to False.
`Verified Uncertainty Calibration`_. Defaults to False.
Raises:
ValueError: If an unsupported norm function is provided.
Expand Down Expand Up @@ -111,7 +111,7 @@ def _ce_update(preds: Tensor, target: Tensor) -> Tuple[FloatTensor, FloatTensor]

def calibration_error(preds: Tensor, target: Tensor, n_bins: int = 15, norm: str = "l1") -> Tensor:
r"""
Computes the top-label calibration error as described in `this paper <https://arxiv.org/pdf/1909.10155.pdf>`_.
`Computes the Top-label Calibration Error`_
Three different norms are implemented, each corresponding to variations on the calibration error metric.
Expand Down
50 changes: 25 additions & 25 deletions torchmetrics/functional/classification/cohen_kappa.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,39 +74,39 @@ def cohen_kappa(
threshold: float = 0.5,
) -> Tensor:
r"""
Calculates `Cohen's kappa score <https://en.wikipedia.org/wiki/Cohen%27s_kappa>`_ that measures
inter-annotator agreement. It is defined as
Calculates `Cohen's kappa score`_ that measures inter-annotator agreement.
It is defined as
.. math::
\kappa = (p_o - p_e) / (1 - p_e)
.. math::
\kappa = (p_o - p_e) / (1 - p_e)
where :math:`p_o` is the empirical probability of agreement and :math:`p_e` isg
the expected agreement when both annotators assign labels randomly. Note that
:math:`p_e` is estimated using a per-annotator empirical prior over the
class labels.
where :math:`p_o` is the empirical probability of agreement and :math:`p_e` isg
the expected agreement when both annotators assign labels randomly. Note that
:math:`p_e` is estimated using a per-annotator empirical prior over the
class labels.
Args:
preds: (float or long tensor), Either a ``(N, ...)`` tensor with labels or
``(N, C, ...)`` where C is the number of classes, tensor with labels/probabilities
Args:
preds: (float or long tensor), Either a ``(N, ...)`` tensor with labels or
``(N, C, ...)`` where C is the number of classes, tensor with labels/probabilities
target: ``target`` (long tensor), tensor with shape ``(N, ...)`` with ground true labels
target: ``target`` (long tensor), tensor with shape ``(N, ...)`` with ground true labels
num_classes: Number of classes in the dataset.
num_classes: Number of classes in the dataset.
weights: Weighting type to calculate the score. Choose from
- ``None`` or ``'none'``: no weighting
- ``'linear'``: linear weighting
- ``'quadratic'``: quadratic weighting
weights: Weighting type to calculate the score. Choose from
- ``None`` or ``'none'``: no weighting
- ``'linear'``: linear weighting
- ``'quadratic'``: quadratic weighting
threshold:
Threshold value for binary or multi-label probabilities. default: 0.5
threshold:
Threshold value for binary or multi-label probabilities. default: 0.5
Example:
>>> from torchmetrics.functional import cohen_kappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> cohen_kappa(preds, target, num_classes=2)
tensor(0.5000)
Example:
>>> from torchmetrics.functional import cohen_kappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> cohen_kappa(preds, target, num_classes=2)
tensor(0.5000)
"""
confmat = _cohen_kappa_update(preds, target, num_classes, threshold)
return _cohen_kappa_compute(confmat, weights)
5 changes: 3 additions & 2 deletions torchmetrics/functional/classification/hinge.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,9 @@ def hinge(
multiclass_mode: Optional[Union[str, MulticlassMode]] = None,
) -> Tensor:
r"""
Computes the mean `Hinge loss <https://en.wikipedia.org/wiki/Hinge_loss>`_, typically used for Support Vector
Machines (SVMs). In the binary case it is defined as:
Computes the mean `Hinge loss`_ typically used for Support Vector Machines (SVMs).
In the binary case it is defined as:
.. math::
\text{Hinge loss} = \max(0, 1 - y \times \hat{y})
Expand Down
2 changes: 1 addition & 1 deletion torchmetrics/functional/classification/iou.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def iou(
reduction: str = "elementwise_mean",
) -> Tensor:
r"""
Computes `Intersection over union, or Jaccard index calculation <https://en.wikipedia.org/wiki/Jaccard_index>`_:
Computes `Jaccard index`_
.. math:: J(A,B) = \frac{|A\cap B|}{|A\cup B|}
Expand Down
4 changes: 2 additions & 2 deletions torchmetrics/functional/classification/kl_divergence.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def _kld_compute(measures: Tensor, total: Tensor, reduction: Optional[str] = "me


def kl_divergence(p: Tensor, q: Tensor, log_prob: bool = False, reduction: Optional[str] = "mean") -> Tensor:
r"""Computes the `KL divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>`_:
r"""Computes `KL divergence`_
.. math::
D_{KL}(P||Q) = \sum_{x\in\mathcal{X}} P(x) \log\frac{P(x)}{Q{x}}
Expand Down Expand Up @@ -112,7 +112,7 @@ def kl_divergence(p: Tensor, q: Tensor, log_prob: bool = False, reduction: Optio


def kldivergence(p: Tensor, q: Tensor, log_prob: bool = False, reduction: Optional[str] = "mean") -> Tensor:
r"""Computes the `KL divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>`_:
r"""Computes `KL divergence`_
.. deprecated:: v0.5
`kldivergence` was renamed as `kl_divergence` in v0.5 and it will be removed in v0.6
Expand Down
6 changes: 3 additions & 3 deletions torchmetrics/functional/classification/precision_recall.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def precision(
multiclass: Optional[bool] = None,
) -> Tensor:
r"""
Computes `Precision <https://en.wikipedia.org/wiki/Precision_and_recall>`_:
Computes `Precision`_
.. math:: \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}
Expand Down Expand Up @@ -281,7 +281,7 @@ def recall(
multiclass: Optional[bool] = None,
) -> Tensor:
r"""
Computes `Recall <https://en.wikipedia.org/wiki/Precision_and_recall>`_:
Computes `Recall`_
.. math:: \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}
Expand Down Expand Up @@ -427,7 +427,7 @@ def precision_recall(
multiclass: Optional[bool] = None,
) -> Tuple[Tensor, Tensor]:
r"""
Computes `Precision and Recall <https://en.wikipedia.org/wiki/Precision_and_recall>`_:
Computes `Precision`_
.. math:: \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}
Expand Down
2 changes: 1 addition & 1 deletion torchmetrics/functional/classification/specificity.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def specificity(
multiclass: Optional[bool] = None,
) -> Tensor:
r"""
Computes `Specificity <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>`_:
Computes `Specificity`_
.. math:: \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}
Expand Down
3 changes: 1 addition & 2 deletions torchmetrics/functional/image/gradients.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ def _compute_image_gradients(img: Tensor) -> Tuple[Tensor, Tensor]:


def image_gradients(img: Tensor) -> Tuple[Tensor, Tensor]:
"""Computes the `gradients <https://en.wikipedia.org/wiki/Image_gradient>`_ of a given image using finite
difference.
"""Computes `Gradient Computation of Image`_ of a given image using finite difference.
Args:
img: An ``(N, C, H, W)`` input tensor where C is the number of image channels
Expand Down
2 changes: 1 addition & 1 deletion torchmetrics/functional/image/ssim.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def _ssim_compute(

input_list = torch.cat((preds, target, preds * preds, target * target, preds * target)) # (5 * B, C, H, W)
outputs = F.conv2d(input_list, kernel, groups=channel)
output_list = [outputs[x * preds.size(0) : (x + 1) * preds.size(0)] for x in range(len(outputs))]
output_list = outputs.split(preds.shape[0])

mu_pred_sq = output_list[0].pow(2)
mu_target_sq = output_list[1].pow(2)
Expand Down
3 changes: 1 addition & 2 deletions torchmetrics/functional/nlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ def bleu_score(
n_gram: int = 4,
smooth: bool = False,
) -> Tensor:
"""Calculate `BLEU score <https://en.wikipedia.org/wiki/BLEU>`_ of machine translated text with one or more
references.
"""Calculate `BLEU score`_ of machine-translated text with one or more references.
Example:
>>> from torchmetrics.functional import bleu_score
Expand Down
2 changes: 1 addition & 1 deletion torchmetrics/functional/regression/r2.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def r2_score(
multioutput: str = "uniform_average",
) -> Tensor:
r"""
Computes r2 score also known as `coefficient of determination`_:
Computes r2 score also known as `R2 Score_Coefficient Determination`_:
.. math:: R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
Expand Down
3 changes: 1 addition & 2 deletions torchmetrics/functional/regression/r2score.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ def r2score(
multioutput: str = "uniform_average",
) -> Tensor:
r"""
Computes r2 score also known as `coefficient of determination
<https://en.wikipedia.org/wiki/Coefficient_of_determination>`_:
Computes r2 score also known as `R2 Score_Coefficient Determination`_
.. deprecated:: v0.5
`r2score` was renamed as `r2_score` in v0.5 and it will be removed in v0.6
Expand Down
3 changes: 1 addition & 2 deletions torchmetrics/functional/regression/spearman.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ def _rank_data(data: Tensor) -> Tensor:
corresponding sorted tensor (starting from 1). Duplicates of the same value will be assigned the mean of their
rank.
Adopted from:
https://github.com/scipy/scipy/blob/v1.6.2/scipy/stats/stats.py#L4140-L4303
Adopted from: `Rank of element tensor`_
"""
n = data.numel()
rank = torch.empty_like(data)
Expand Down
4 changes: 1 addition & 3 deletions torchmetrics/functional/retrieval/reciprocal_rank.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@


def retrieval_reciprocal_rank(preds: Tensor, target: Tensor) -> Tensor:
"""Computes reciprocal rank (for information retrieval).
"""Computes reciprocal rank (for information retrieval). See `Mean Reciprocal Rank`_
``preds`` and ``target`` should be of the same shape and live on the same device. If no ``target`` is ``True``,
0 is returned. ``target`` must be either `bool` or `integers` and ``preds`` must be `float`,
Expand All @@ -37,8 +37,6 @@ def retrieval_reciprocal_rank(preds: Tensor, target: Tensor) -> Tensor:
>>> target = torch.tensor([False, True, False])
>>> retrieval_reciprocal_rank(preds, target)
tensor(0.5000)
.. explained: https://en.wikipedia.org/wiki/Mean_reciprocal_rank
"""
preds, target = _check_retrieval_functional_inputs(preds, target)

Expand Down
Loading

0 comments on commit 04e44f9

Please sign in to comment.