You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
This PR fixes the way brevity penalty (specifically the effective reference corpus length) is calculated in BLEU.
Previously, `len_reference` was calculated as `min([len(ref) for ref in references_tokenized])`. However, this is incorrect, because according to the paper, we need to find the "best match length", not the minimum reference length.
For more information, see [wikipedia - brevity penalty](https://en.wikipedia.org/wiki/BLEU#Brevity_penalty) and [nltk implementation](https://www.nltk.org/_modules/nltk/translate/bleu_score.html#closest_ref_length).
Pull Request resolved: #195
Test Plan: I added another unit test to `test_bleu.py` and compared the results of the calculations to the results of the `nltk.translate.bleu_score.corpus_bleu` function to make sure the implementation is correct.
Reviewed By: galrotem
Differential Revision: D56846091
Pulled By: JKSenthil
fbshipit-source-id: 2bf1cd0ba169535a118222e60f4264259248f1fd
0 commit comments