Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check data for unhelpful responses #103

Merged
merged 7 commits into from
May 1, 2023
Merged

Conversation

pseudotensor
Copy link
Collaborator

@pseudotensor pseudotensor commented May 1, 2023

pytest -s -v tests/create_data.test_check_unhelpful

gives (for just bot responses) for openassistant_oasst1_h2ogpt_graded.json

{   'As a large language model': 15,
    'As an artificial intelligence I do not have the capability': 2,
    'I am not capable of': 15,
    'I am sorry': 41,
    "I didn't understand your question": 1,
    "I'm sorry": 279,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, but as an AI language model": 19,
    'You need to provide more context': 3,
    'as an AI language model': 50,
    'do not have access': 20,
    'nor am I capable': 1,
    'provide more context': 26,
    'sorry, but as an AI language model': 26}
total_bads_bots: 501

For h2ogpt-oig-oasst1-instruct-cleaned-v2.json:

{   'As a large language model': 25,
    'As an artificial intelligence I cannot': 3,
    'As an artificial intelligence I do not have the capability': 2,
    'I am not capable of': 27,
    'I am sorry': 125,
    'I apologize, but I cannot': 1,
    "I didn't quite understand your question": 5,
    "I didn't understand your question": 2,
    "I'm sorry": 518,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, I didn't quite understand your question": 5,
    "I'm sorry, I didn't quite understand your question, could you please rephrase it?": 5,
    "I'm sorry, but as an AI language model": 22,
    'Sorry, but I am not ': 1,
    'Sorry, but I am not an actual Linux shell, nor am I capable of emulating one. I am an open source chat assistant and would be glad t': 1,
    'You need to provide more context': 3,
    'as an AI language model': 61,
    'do not have access': 80,
    'nor am I capable': 2,
    'not sure what you are asking': 3,
    'provide more context': 66,
    "sorry, I didn't quite understand your question": 5,
    'sorry, but as an AI language model': 29}
total_bads_bots: 994

If higher reward model score threshold doesn't help. It would be cheating to just filter these exact matches out, will leave too many other non-explicit cases in I didn't hard-code.

Could use BLEU etc. for matching response with those example targets perhaps.

Note in some cases OASST has these as "toxic" Q/A pairs, so good it didn't do it. But it makes the model less smart by too much to have those, randomly will respond in that way for totally safe questions/prompts.

Some other AI moralizing filters could be done, some of those are excessive however: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered . The problem is not the need for AI alignment, the problem is these models parrot responses without any real conditions. Just randomly become unhelpful as proven by just regenerating. It can just be that a typo in word or grammar mistake can be enough to typically respond back in an unhelpful way, which is a bad bias to have.

@pseudotensor pseudotensor changed the title Check data for unhelpful content Check data for unhelpful responses May 1, 2023
@pseudotensor
Copy link
Collaborator Author

pseudotensor commented May 1, 2023

ShareGPT instruct data on 20B gets about 0.46 mean for 100 ShareGPT when using the "no sorry" version of ShareGPT data: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

image

We already beat that with 20B on OASST without filtering at 0.49 mean (but worse on median) with 512 context on LORA training. We saw 2048 context training leads to poorer results:

image

and llama 30B on OIG+OASST gets 0.55:

image

but many of those LLaMa answers are "sorry" responses for the 100 test ShareGPT eval set. So LLaMa would be even higher with less unhelpful data.

For reference, GPT3.5 does quite well:

image

Together Chat 20B does poorly (2048ctx for LORA though):

image

Just OIG does poorly for 20B non-chat (2048ctx for LORA though):

image

only 1 epoch for OASST 20B does not so great (2048ctx for LORA though):

image

Dolly does not so great (full fine-tune, has 2048 fine-tune context problems then):

image

@pseudotensor pseudotensor marked this pull request as ready for review May 1, 2023 22:33
@pseudotensor pseudotensor merged commit 6775a2b into main May 1, 2023
@pseudotensor
Copy link
Collaborator Author

threshold=0.5

{   'As a large language model': 7,
    'Can you please explain': 1,
    'I am not capable of': 7,
    'I am sorry': 34,
    "I didn't understand your question": 1,
    "I'm sorry": 138,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, but as an AI language model": 3,
    'as an AI language model': 23,
    'do not have access': 15,
    'etc etc': 1,
    'etc.': 1148,
    'nor am I capable': 1,
    'provide more context': 17,
    'sorry, but as an AI language model': 10}

threshold=0.0:

{   'As a large language model': 25,
    'As an artificial intelligence I cannot': 3,
    'As an artificial intelligence I do not have the capability': 2,
    'Can you please explain': 1,
    'I am not capable of': 27,
    'I am sorry': 91,
    'I apologize, but I cannot': 1,
    "I didn't quite understand your question": 5,
    "I didn't understand your question": 1,
    "I'm sorry": 485,
    "I'm sorry, I cannot perform this task as I am an AI language model and do not have access": 3,
    "I'm sorry, I didn't quite understand your question": 5,
    "I'm sorry, I didn't quite understand your question, could you please rephrase it?": 5,
    "I'm sorry, but as an AI language model": 22,
    'Sorry, but I am not ': 1,
    'Sorry, but I am not an actual Linux shell, nor am I capable of emulating one. I am an open source chat assistant and would be glad t': 1,
    'You need to provide more context': 3,
    'as an AI language model': 61,
    'do not have access': 59,
    'etc etc': 1,
    'etc.': 1443,
    'nor am I capable': 2,
    'not sure what you are asking': 3,
    'provide more context': 66,
    "sorry, I didn't quite understand your question": 5,
    'sorry, but as an AI language model': 29}
threshold: 0 total_bads_bots: 2350 total_bots: 48307 total_humans: 48307
threshold: 0.1 total_bads_bots: 1993 total_bots: 39508 total_humans: 39508
threshold: 0.2 total_bads_bots: 1811 total_bots: 35293 total_humans: 35293
threshold: 0.3 total_bads_bots: 1627 total_bots: 32437 total_humans: 32437
threshold: 0.4 total_bads_bots: 1535 total_bots: 30026 total_humans: 30026
threshold: 0.5 total_bads_bots: 1409 total_bots: 27637 total_humans: 27637

So threshold on deberta grade (reward score) isn't enough.

@pseudotensor
Copy link
Collaborator Author

pseudotensor commented May 3, 2023

BLEU doesn't do a good job of matching patterns.

Similarity search doesn't do a good job etiher.

Asymmetric search-like query-answer is not appropriate.

E.g.

cosine_similarity(model.encode(["I'm sorry, but I don't have the ability to create stories. However, if you would like me to write a story for you, I can provide you with some suggestions and ideas."]), sentence_embeddings)

gives 0.73 but:

cosine_similarity(model.encode(["If you would like me to write a story for you, I can provide you with some suggestions and ideas."]), sentence_embeddings)

gives 0.69.

So barely higher similarity even though I see no reason why latter should be so if just consider substring matches.
So even though literally substring matches, less similarity.

unhelpful[np.argmax(cosine_similarity(model.encode(["I'm sorry, but I don't have the ability to create stories. However, if you would like me to write a story for you, I can provide you with some suggestions and ideas."]), sentence_embeddings)[0, :])]

gives

"I'm sorry, but I don't know how to tell a story. Can you please explain what you mean by"

even:

unhelpful[np.argmax(cosine_similarity(model.encode(["me to write a story for you."]), sentence_embeddings)[0, :])]

gives match to 0.6 score just because of "story" "me" and "you". So unrelated to expected intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant