Check data for unhelpful responses #103

pseudotensor · 2023-05-01T20:58:30Z

pytest -s -v tests/create_data.test_check_unhelpful

gives (for just bot responses) for openassistant_oasst1_h2ogpt_graded.json

{   'As a large language model': 15,
    'As an artificial intelligence I do not have the capability': 2,
    'I am not capable of': 15,
    'I am sorry': 41,
    "I didn't understand your question": 1,
    "I'm sorry": 279,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, but as an AI language model": 19,
    'You need to provide more context': 3,
    'as an AI language model': 50,
    'do not have access': 20,
    'nor am I capable': 1,
    'provide more context': 26,
    'sorry, but as an AI language model': 26}
total_bads_bots: 501

For h2ogpt-oig-oasst1-instruct-cleaned-v2.json:

{   'As a large language model': 25,
    'As an artificial intelligence I cannot': 3,
    'As an artificial intelligence I do not have the capability': 2,
    'I am not capable of': 27,
    'I am sorry': 125,
    'I apologize, but I cannot': 1,
    "I didn't quite understand your question": 5,
    "I didn't understand your question": 2,
    "I'm sorry": 518,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, I didn't quite understand your question": 5,
    "I'm sorry, I didn't quite understand your question, could you please rephrase it?": 5,
    "I'm sorry, but as an AI language model": 22,
    'Sorry, but I am not ': 1,
    'Sorry, but I am not an actual Linux shell, nor am I capable of emulating one. I am an open source chat assistant and would be glad t': 1,
    'You need to provide more context': 3,
    'as an AI language model': 61,
    'do not have access': 80,
    'nor am I capable': 2,
    'not sure what you are asking': 3,
    'provide more context': 66,
    "sorry, I didn't quite understand your question": 5,
    'sorry, but as an AI language model': 29}
total_bads_bots: 994

If higher reward model score threshold doesn't help. It would be cheating to just filter these exact matches out, will leave too many other non-explicit cases in I didn't hard-code.

Could use BLEU etc. for matching response with those example targets perhaps.

Note in some cases OASST has these as "toxic" Q/A pairs, so good it didn't do it. But it makes the model less smart by too much to have those, randomly will respond in that way for totally safe questions/prompts.

Some other AI moralizing filters could be done, some of those are excessive however: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered . The problem is not the need for AI alignment, the problem is these models parrot responses without any real conditions. Just randomly become unhelpful as proven by just regenerating. It can just be that a typo in word or grammar mistake can be enough to typically respond back in an unhelpful way, which is a bad bias to have.

pseudotensor · 2023-05-01T21:30:02Z

ShareGPT instruct data on 20B gets about 0.46 mean for 100 ShareGPT when using the "no sorry" version of ShareGPT data: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

We already beat that with 20B on OASST without filtering at 0.49 mean (but worse on median) with 512 context on LORA training. We saw 2048 context training leads to poorer results:

and llama 30B on OIG+OASST gets 0.55:

but many of those LLaMa answers are "sorry" responses for the 100 test ShareGPT eval set. So LLaMa would be even higher with less unhelpful data.

For reference, GPT3.5 does quite well:

Together Chat 20B does poorly (2048ctx for LORA though):

Just OIG does poorly for 20B non-chat (2048ctx for LORA though):

only 1 epoch for OASST 20B does not so great (2048ctx for LORA though):

Dolly does not so great (full fine-tune, has 2048 fine-tune context problems then):

pseudotensor · 2023-05-03T00:21:44Z

threshold=0.5

{   'As a large language model': 7,
    'Can you please explain': 1,
    'I am not capable of': 7,
    'I am sorry': 34,
    "I didn't understand your question": 1,
    "I'm sorry": 138,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, but as an AI language model": 3,
    'as an AI language model': 23,
    'do not have access': 15,
    'etc etc': 1,
    'etc.': 1148,
    'nor am I capable': 1,
    'provide more context': 17,
    'sorry, but as an AI language model': 10}

threshold=0.0:

{   'As a large language model': 25,
    'As an artificial intelligence I cannot': 3,
    'As an artificial intelligence I do not have the capability': 2,
    'Can you please explain': 1,
    'I am not capable of': 27,
    'I am sorry': 91,
    'I apologize, but I cannot': 1,
    "I didn't quite understand your question": 5,
    "I didn't understand your question": 1,
    "I'm sorry": 485,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, I didn't quite understand your question": 5,
    "I'm sorry, I didn't quite understand your question, could you please rephrase it?": 5,
    "I'm sorry, but as an AI language model": 22,
    'Sorry, but I am not ': 1,
    'Sorry, but I am not an actual Linux shell, nor am I capable of emulating one. I am an open source chat assistant and would be glad t': 1,
    'You need to provide more context': 3,
    'as an AI language model': 61,
    'do not have access': 59,
    'etc etc': 1,
    'etc.': 1443,
    'nor am I capable': 2,
    'not sure what you are asking': 3,
    'provide more context': 66,
    "sorry, I didn't quite understand your question": 5,
    'sorry, but as an AI language model': 29}

threshold: 0 total_bads_bots: 2350 total_bots: 48307 total_humans: 48307
threshold: 0.1 total_bads_bots: 1993 total_bots: 39508 total_humans: 39508
threshold: 0.2 total_bads_bots: 1811 total_bots: 35293 total_humans: 35293
threshold: 0.3 total_bads_bots: 1627 total_bots: 32437 total_humans: 32437
threshold: 0.4 total_bads_bots: 1535 total_bots: 30026 total_humans: 30026
threshold: 0.5 total_bads_bots: 1409 total_bots: 27637 total_humans: 27637

So threshold on deberta grade (reward score) isn't enough.

pseudotensor · 2023-05-03T22:44:37Z

BLEU doesn't do a good job of matching patterns.

Similarity search doesn't do a good job etiher.

Asymmetric search-like query-answer is not appropriate.

E.g.

cosine_similarity(model.encode(["I'm sorry, but I don't have the ability to create stories. However, if you would like me to write a story for you, I can provide you with some suggestions and ideas."]), sentence_embeddings)

gives 0.73 but:

cosine_similarity(model.encode(["If you would like me to write a story for you, I can provide you with some suggestions and ideas."]), sentence_embeddings)

gives 0.69.

So barely higher similarity even though I see no reason why latter should be so if just consider substring matches.
So even though literally substring matches, less similarity.

unhelpful[np.argmax(cosine_similarity(model.encode(["I'm sorry, but I don't have the ability to create stories. However, if you would like me to write a story for you, I can provide you with some suggestions and ideas."]), sentence_embeddings)[0, :])]

gives

"I'm sorry, but I don't know how to tell a story. Can you please explain what you mean by"

even:

unhelpful[np.argmax(cosine_similarity(model.encode(["me to write a story for you."]), sentence_embeddings)[0, :])]

gives match to 0.6 score just because of "story" "me" and "you". So unrelated to expected intent.

…ally do well #103 (comment)

pseudotensor force-pushed the checkdataunhelpful2 branch from 69f9ba1 to c495e44 Compare May 1, 2023 21:00

Check data for unhelpful content

0a6d510

pseudotensor force-pushed the checkdataunhelpful2 branch from c495e44 to 0a6d510 Compare May 1, 2023 21:01

Get count

5d1bc58

pseudotensor force-pushed the checkdataunhelpful2 branch from 58aad39 to 5d1bc58 Compare May 1, 2023 21:10

pseudotensor changed the title ~~Check data for unhelpful content~~ Check data for unhelpful responses May 1, 2023

Couple other checks

da648d1

pseudotensor added 2 commits May 1, 2023 15:07

Just count bot response problems

1e3e265

Show totals

71daccb

pseudotensor marked this pull request as ready for review May 1, 2023 22:33

pseudotensor added 2 commits May 1, 2023 15:37

another unhelpful

c3f5c85

Another one I saw our models generate

fd7cbcc

pseudotensor merged commit 6775a2b into main May 1, 2023

pseudotensor added a commit that referenced this pull request May 3, 2023

WIP sentence similarity filter for unhelpful instruct, but doesn't re…

938b69f

…ally do well #103 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check data for unhelpful responses #103

Check data for unhelpful responses #103

pseudotensor commented May 1, 2023 •

edited

Loading

pseudotensor commented May 1, 2023 •

edited

Loading

pseudotensor commented May 3, 2023

pseudotensor commented May 3, 2023 •

edited

Loading

Check data for unhelpful responses #103

Check data for unhelpful responses #103

Conversation

pseudotensor commented May 1, 2023 • edited Loading

pseudotensor commented May 1, 2023 • edited Loading

pseudotensor commented May 3, 2023

pseudotensor commented May 3, 2023 • edited Loading

pseudotensor commented May 1, 2023 •

edited

Loading

pseudotensor commented May 1, 2023 •

edited

Loading

pseudotensor commented May 3, 2023 •

edited

Loading