-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check data for unhelpful responses #103
Conversation
69f9ba1
to
c495e44
Compare
c495e44
to
0a6d510
Compare
58aad39
to
5d1bc58
Compare
ShareGPT instruct data on 20B gets about 0.46 mean for 100 ShareGPT when using the "no sorry" version of ShareGPT data: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered We already beat that with 20B on OASST without filtering at 0.49 mean (but worse on median) with 512 context on LORA training. We saw 2048 context training leads to poorer results: and llama 30B on OIG+OASST gets 0.55: but many of those LLaMa answers are "sorry" responses for the 100 test ShareGPT eval set. So LLaMa would be even higher with less unhelpful data. For reference, GPT3.5 does quite well: Together Chat 20B does poorly (2048ctx for LORA though): Just OIG does poorly for 20B non-chat (2048ctx for LORA though): only 1 epoch for OASST 20B does not so great (2048ctx for LORA though): Dolly does not so great (full fine-tune, has 2048 fine-tune context problems then): |
threshold=0.5
threshold=0.0:
So threshold on deberta grade (reward score) isn't enough. |
BLEU doesn't do a good job of matching patterns. Similarity search doesn't do a good job etiher. Asymmetric search-like query-answer is not appropriate. E.g.
gives 0.73 but:
gives 0.69. So barely higher similarity even though I see no reason why latter should be so if just consider substring matches.
gives
even:
gives match to 0.6 score just because of "story" "me" and "you". So unrelated to expected intent. |
gives (for just bot responses) for
openassistant_oasst1_h2ogpt_graded.json
For
h2ogpt-oig-oasst1-instruct-cleaned-v2.json
:If higher reward model score threshold doesn't help. It would be cheating to just filter these exact matches out, will leave too many other non-explicit cases in I didn't hard-code.
Could use BLEU etc. for matching response with those example targets perhaps.
Note in some cases OASST has these as "toxic" Q/A pairs, so good it didn't do it. But it makes the model less smart by too much to have those, randomly will respond in that way for totally safe questions/prompts.
Some other AI moralizing filters could be done, some of those are excessive however: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered . The problem is not the need for AI alignment, the problem is these models parrot responses without any real conditions. Just randomly become unhelpful as proven by just regenerating. It can just be that a typo in word or grammar mistake can be enough to typically respond back in an unhelpful way, which is a bad bias to have.