Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues trying Rejection Rate #8

Open
alvelvis opened this issue Jan 30, 2024 · 4 comments
Open

Some issues trying Rejection Rate #8

alvelvis opened this issue Jan 30, 2024 · 4 comments

Comments

@alvelvis
Copy link

alvelvis commented Jan 30, 2024

Hi! Congratulations for the amazing work you've done.

I'm trying to use RGB for testing my RAG pipeline, but I'm having the following issue with Rejection Rate: even if I only feed the model with the negative examples, it will still sometimes answer correctly, because it turns out some of the negative examples in fact have the correct answer...

For example:

Question: How much did Elon Musk bought Twitter?
Correct answer (according to RGB): [44 billion]
My model answer: Elon Musk bought Twitter at his original offer price of $54.20 a share, with a total cost of roughly $44 billion.

The model was expected to REJECT to answer this, since I only gave it negative examples, right? But in fact, there are a few negative documents that have this answer:

  • Oct 28, 2022 ... Elon Musk takes control of Twitter in $44bn deal · What next for Twitter under Elon Musk? · How the world's richest person bought Twitter · Who is\xa0...

  • After building a stake in Twitter at the start of the year, Mr Musk made his $44bn offer in April, a price tag that looked too high almost as soon as it was agreed. He...

Is it an expected behavior or am I doing something wrong?

thanks in advance

@chen700564
Copy link
Owner

Hi, this is a mistake in our dataset. We identify noise documents by checking if they contain the answer. In this case, the answer contained in the document is "$40 bn", but the annotated golden answer is "44 billion". Our filter rule will regard this document as noise document.
Thank you for bringing this to our attention. We will fix some bug like that.

@qianzhang2018
Copy link

Thank you for your excellent work. For the data, I'm in agreement with this one, I found a very large number of examples containing correct answers in the counterexample results, and would like to fix the dataset.

@qianzhang2018
Copy link

比如在zh.json中的第一个案例的反例中,"印寺廟元旦人踩人十二亡     【香港中通社月一日電】印控克什米爾地區一座寺廟一月一日凌晨發生一宗踩踏事故,已造成十二人死亡、二十人受傷。     爭吵推搡釀慘劇     綜合媒體一月一日報道,當天二時四十五分左右,印控克什米爾地區冬季首府查謨附近一座寺廟發生踩踏事故。事發時,從各地趕來的大批信徒正準備進行新年祭拜,由於不少信徒沒有得到進入許可,他們圍擠在寺廟外並發生推搡,最終造成踩踏。", 这个是能够找到正确答案的,正确答案是12人死亡

@alvelvis
Copy link
Author

alvelvis commented Feb 1, 2024

@chen700564 thanks a lot for the answer.

I'm trying a few tweaks in order to test negative rejection in my RAG pipeline. The best approach I've found is not feeding the language model with any of the documents relating to the question, it doesn't matter whether they are positive or negative documents. I'm only feeding it with the other documents from the benchmark, related to other questions, this way I'd really expect the model not to hallucinate and say it doesn't know the correct answer and this is the rejection rate I'm evaluating. There is still one flaw to this approach: some of the questions are similar, so even if I won't feed the model with the documents from a given question, it will receive documents from a question that is similar and that may contain the correct answer. But I think I'll keep in this approach...

Any news on this please let me know :)
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants