potential bug in self-bleu calculations #46

hadyelsahar · 2020-07-21T16:50:34Z

According the paper for self-bleu calculations each generation is compared against all the other references.

The current Self-BLEU implementation includes the selected hypothesis in the list of references. This risks inflation in the self-bleu scores as there will be always a direct match between the hypothesis and one of the references.

    def get_bleu(self):
        ngram = self.gram
        bleu = list()
        reference = self.get_reference()
        weight = tuple((1. / ngram for _ in range(ngram)))
        with open(self.test_data) as test_data:
            for hypothesis in test_data:
                hypothesis = nltk.word_tokenize(hypothesis)
                bleu.append(nltk.translate.bleu_score.sentence_bleu(reference, hypothesis, weight,
                                                                    smoothing_function=SmoothingFunction().method1))
        return sum(bleu) / len(bleu)

should we remove the target hypothesis from the set of references or am I missing something here?

Thanks for the help in advance

The text was updated successfully, but these errors were encountered:

yanghoonkim · 2020-08-03T06:26:51Z

I think we should use the bleu_parallel function in the Self-BLEU implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential bug in self-bleu calculations #46

potential bug in self-bleu calculations #46

hadyelsahar commented Jul 21, 2020

yanghoonkim commented Aug 3, 2020

potential bug in self-bleu calculations #46

potential bug in self-bleu calculations #46

Comments

hadyelsahar commented Jul 21, 2020

yanghoonkim commented Aug 3, 2020