Skip to content

Into Generative AI

Compare
Choose a tag to compare
@Borda Borda released this 22 Aug 19:49
· 663 commits to master since this release

In version v1.1 of Torchmetrics, in total five new metrics have been added, bringing the total number of metrics up to 128! In particular, we have two new exciting metrics for evaluating your favorite generative models for images.

Perceptual Path length

Introduced in the famous StyleGAN paper back in 2018 the Perceptual path length metric is used to quantify how smoothly a generator manages to interpolate between points in its latent space.
Why does the smoothness of the latent space of your generative model matter? Assume you find an image at some point in your latent space that generates an image you like, but you would like to see if you could find a better one if you slightly change the latent point it was generated from. If your latent space could be smoother, this because very hard because even small changes to the latent point can lead to large changes in the generated image.

CLIP image quality assessment

CLIP image quality assessment (CLIPIQA) is a very recently proposed metric in this paper. The metrics build on the OpenAI CLIP model, which is a multi-modal model for connecting text and images. The core idea behind the metric is that different properties of an image can be assessed by measuring how similar the CLIP embedding of the image is to the respective CLIP embedding of a positive and negative prompt for that given property.

VIF, Edit, and SA-SDR

  • VisualInformationFidelity has been added to the image package. The first proposed in this paper can be used to automatically assess the quality of images in a perceptual manner.

  • EditDistance have been added to the text package. A very classical metric for text that simply measures the amount of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text.

  • SourceAggregatedSignalDistortionRatio has been added to the audio package. Metric was originally proposed in this paper and is an improvement over the classical Signal-to-Distortion Ratio (SDR) metric (also found in torchmetrics) that provides more stable gradients during training when trying to train models for style source separation.

[1.1.0] - 2022-08-22

Added

  • Added source aggregated signal-to-distortion ratio (SA-SDR) metric (#1882
  • Added VisualInformationFidelity to image package (#1830)
  • Added EditDistance to text package (#1906)
  • Added top_k argument to RetrievalMRR in retrieval package (#1961)
  • Added support for evaluating "segm" and "bbox" detection in MeanAveragePrecision at the same time (#1928)
  • Added PerceptualPathLength to image package (#1939)
  • Added support for multioutput evaluation in MeanSquaredError (#1937)
  • Added argument extended_summary to MeanAveragePrecision such that precision, recall, iou can be easily returned (#1983)
  • Added warning to ClipScore if long captions are detected and truncate (#2001)
  • Added CLIPImageQualityAssessment to multimodal package (#1931)
  • Added new property metric_state to all metrics for users to investigate currently stored tensors in memory (#2006)

Full Changelog: v1.0.0...v1.1.0


New Contributors since v1.0.0

Contributors

@bojobo, @lucadiliello, @quancs, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]