Into Generative AI
In version v1.1 of Torchmetrics, in total five new metrics have been added, bringing the total number of metrics up to 128! In particular, we have two new exciting metrics for evaluating your favorite generative models for images.
Perceptual Path length
Introduced in the famous StyleGAN paper back in 2018 the Perceptual path length metric is used to quantify how smoothly a generator manages to interpolate between points in its latent space.
Why does the smoothness of the latent space of your generative model matter? Assume you find an image at some point in your latent space that generates an image you like, but you would like to see if you could find a better one if you slightly change the latent point it was generated from. If your latent space could be smoother, this because very hard because even small changes to the latent point can lead to large changes in the generated image.
CLIP image quality assessment
CLIP image quality assessment (CLIPIQA) is a very recently proposed metric in this paper. The metrics build on the OpenAI CLIP model, which is a multi-modal model for connecting text and images. The core idea behind the metric is that different properties of an image can be assessed by measuring how similar the CLIP embedding of the image is to the respective CLIP embedding of a positive and negative prompt for that given property.
VIF, Edit, and SA-SDR
-
VisualInformationFidelity
has been added to the image package. The first proposed in this paper can be used to automatically assess the quality of images in a perceptual manner. -
EditDistance
have been added to the text package. A very classical metric for text that simply measures the amount of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text. -
SourceAggregatedSignalDistortionRatio
has been added to the audio package. Metric was originally proposed in this paper and is an improvement over the classical Signal-to-Distortion Ratio (SDR) metric (also found in torchmetrics) that provides more stable gradients during training when trying to train models for style source separation.
[1.1.0] - 2022-08-22
Added
- Added source aggregated signal-to-distortion ratio (SA-SDR) metric (#1882
- Added
VisualInformationFidelity
to image package (#1830) - Added
EditDistance
to text package (#1906) - Added
top_k
argument toRetrievalMRR
in retrieval package (#1961) - Added support for evaluating
"segm"
and"bbox"
detection inMeanAveragePrecision
at the same time (#1928) - Added
PerceptualPathLength
to image package (#1939) - Added support for multioutput evaluation in
MeanSquaredError
(#1937) - Added argument
extended_summary
toMeanAveragePrecision
such that precision, recall, iou can be easily returned (#1983) - Added warning to
ClipScore
if long captions are detected and truncate (#2001) - Added
CLIPImageQualityAssessment
to multimodal package (#1931) - Added new property
metric_state
to all metrics for users to investigate currently stored tensors in memory (#2006)
Full Changelog: v1.0.0...v1.1.0
New Contributors since v1.0.0
- @fansuregrin made their first contribution in #1892
- @salcc made their first contribution in #1934
- @IanMaquignaz made their first contribution in #1943
- @kn made their first contribution in #1955
- @Vivswan made their first contribution in #1982
- @njuaplusplus made their first contribution in #1986
Contributors
@bojobo, @lucadiliello, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]