-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
my idea of metric on diversity. #6
Comments
If I understand correctly, you are referring to the self-BLEU. Actually, I opened an issue #27 on Texygen about the self-BLEU metric. |
No, it is not self bleu. the bleu in your work is something like
it can be a metric of generated reality. my idea is to calculate
as a metric of generated diversity, while high score means all the the_whole_test_data can be found in the_whole_generated_data. |
Thanks for the explanation and now I see your point. I guess what you have proposed is basically the same with bleu, since the func |
Approximately, the original one is to check whether if If both of them are high, it means |
I have computed Self-BLEU which ensured that test data and reference data is the same. I guess that the issue #27 on Texygen does not happen for me. Because I do not reuse the saved "references" in SelfBleu Class. For COCO, I saved 1,000 sentences and compute Self-BLEU-2 at each epoch. After pretraining, Self-BLEU-2 was around 0.76. After adversarial training for about 10 epochs (3130 iters), Self-BLEU-2 rise to about 0.85. |
Hmm, this is interesting. Could you please share your code to calculate the self-BLEU score? Thanks! |
In your article, you use the whole test data as reference then calculate the BLEU of each generated sentence. The average of them can be a metric of generated reality.
Conversely, why not use the whole generated data (the same number as test data) as reference then calculate the BLEU of each test sentence. The average of them can be a metric of generated diversity.
The text was updated successfully, but these errors were encountered: