-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BaryScore
#852
Comments
Interested! |
Great, @ashutoshml! Just let's wait on #849 as we discussed as there'll be many common dependencies :] |
Hi @ashutoshml - I'd like to apologize for my delay here. I've been quite busy and unable to complete my part yet. Gonna try to proceed this weekend as much as possible. |
Hello @ashutoshml, very sorry for my really huge delay, last months have been pretty turbulent for me. Let me know if you're still interested in contributing this metric to |
Hi @stancld. I hope things are ok now. |
Hi @ashutoshml, it's completely fine - the time is really flying so we're almost there. If you're then interested, just let me know. I have some WIP we can discuss and can create a MR together :] |
Helo @ashutoshml, any updates here? :] |
Hi @stancld - Was occupied with some work. I will have a look at it this weekend. |
great, looking forward @ashutoshml 🚀 |
🚀 Feature
Add
BaryScore
Sources:
Motivation
The recent NLG metrics are more often based on BERT (or related) embeddings. As such, I believe, we should also start adding such metrics into
TorchMetrics
with an extra dependency ontransformers
if a user wants to use any of these metrics. TheBaryScore
metric is from a family of untrained metrics (i.e. the model is not fine-tuned on any specific task) so it should be easier for us to begin with it.Abstract:
A new metric BaryScore to evaluate text generation based on deep contextualized embeddings e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions e.g., MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that \texttt{BaryScore} outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.
The text was updated successfully, but these errors were encountered: