Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential bug in self-bleu calculations #46

Open
hadyelsahar opened this issue Jul 21, 2020 · 1 comment
Open

potential bug in self-bleu calculations #46

hadyelsahar opened this issue Jul 21, 2020 · 1 comment

Comments

@hadyelsahar
Copy link

According the paper for self-bleu calculations each generation is compared against all the other references.

The current Self-BLEU implementation includes the selected hypothesis in the list of references. This risks inflation in the self-bleu scores as there will be always a direct match between the hypothesis and one of the references.

    def get_bleu(self):
        ngram = self.gram
        bleu = list()
        reference = self.get_reference()
        weight = tuple((1. / ngram for _ in range(ngram)))
        with open(self.test_data) as test_data:
            for hypothesis in test_data:
                hypothesis = nltk.word_tokenize(hypothesis)
                bleu.append(nltk.translate.bleu_score.sentence_bleu(reference, hypothesis, weight,
                                                                    smoothing_function=SmoothingFunction().method1))
        return sum(bleu) / len(bleu)

should we remove the target hypothesis from the set of references or am I missing something here?

Thanks for the help in advance

@yanghoonkim
Copy link

I think we should use the bleu_parallel function in the Self-BLEU implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants