[Doctest] Fix `Blenderbot` doctest #21297

younesbelkada · 2023-01-25T12:35:09Z

What does this PR do?

This PR fixes the doctest transformers.models.blenderbot.modeling_blenderbot.BlenderbotForConditionalGeneration.forward . Link to failing job is here: https://github.com/huggingface/transformers/actions/runs/4002271138/jobs/6869333719

Updating the prediction with the correct result seems to be the correct fix. I am usure whether this was tested before so I cannot compare for now.

One thing that I suspect is that we get different results across different PT versions, but not sure

cc @ydshieh

- add correct expected value

HuggingFaceDocBuilderDev · 2023-01-25T12:48:41Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for the fix!

younesbelkada · 2023-01-25T15:08:00Z

Actually, as discussed with @ydshieh offline, there seems to be a discrepency between BlenderbotTokenizer & BlenderbotTokenizerFast.
The PR #21225 changed the docstring to use AutoTokenizer instead of BlenderbotTokenizer. This lead to loading BlenderbotTokenizerFast. You can reproduce the discrepency with the script below:

from transformers import BlenderbotTokenizer, BlenderbotTokenizerFast, BlenderbotForConditionalGeneration, AutoTokenizer

mname = "facebook/blenderbot-400M-distill"
model = BlenderbotForConditionalGeneration.from_pretrained(mname)

tokenizer = BlenderbotTokenizer.from_pretrained(mname)
tokenizer_fast = BlenderbotTokenizerFast.from_pretrained(mname)

def generate(tokenizer):
    UTTERANCE = "My friends are cool but they eat too many carbs."

    inputs = tokenizer([UTTERANCE], return_tensors="pt")

    NEXT_UTTERANCE = (
        "My friends are cool but they eat too many carbs.</s> <s>That's unfortunate. "
        "Are they trying to lose weight or are they just trying to be healthier?</s> "
        "<s> I'm not sure."
    )
    inputs = tokenizer([NEXT_UTTERANCE], return_tensors="pt")
    next_reply_ids = model.generate(**inputs)
    print("Bot: ", tokenizer.batch_decode(next_reply_ids, skip_special_tokens=True)[0])

generate(tokenizer)
>>> Bot:   That's too bad. Have you tried encouraging them to change their eating habits? 
generate(tokenizer_fast)
>>> Bot:   I see. Well, it's good that they're trying to change their eating habits.

I am not sure if this is a known bug or intended. Maybe the changes I proposed is not the correct fix here

ydshieh · 2023-01-25T15:17:06Z

Thanks for digging deeper @younesbelkada! Could you check what's the difference between the fast and slow tokenizer from the checkpoint used in this doc example? And compare the difference of inputs = tokenizer([UTTERANCE], return_tensors="pt") between these 2 tokenizers.

Another similar issue (but not related to this one)
#21254

sgugger · 2023-01-25T15:30:37Z

I think it's good as we want to default to fast tokenizers (which is the reason we switched to AutoTokenizer) so the fix is the right one in my opinion.

ydshieh · 2023-01-25T15:37:34Z

I agree - but just thinking if we should find out what's going wrong and potentially fix the inconsistency between these 2 tokenizers (or something in our codebase).

The fix is good for me, and you can merge @younesbelkada !

younesbelkada · 2023-01-25T16:28:24Z

Thanks everyone!
I will open an issue to describe the bug
EDIT: #21305

fix blenderbot doctest

b5003f5

- add correct expected value

sgugger approved these changes Jan 25, 2023

View reviewed changes

younesbelkada merged commit 015443f into huggingface:main Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doctest] Fix `Blenderbot` doctest #21297

[Doctest] Fix `Blenderbot` doctest #21297

Uh oh!

younesbelkada commented Jan 25, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jan 25, 2023 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

younesbelkada commented Jan 25, 2023 •

edited

Loading

Uh oh!

ydshieh commented Jan 25, 2023

Uh oh!

sgugger commented Jan 25, 2023

Uh oh!

ydshieh commented Jan 25, 2023

Uh oh!

younesbelkada commented Jan 25, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Doctest] Fix Blenderbot doctest #21297

[Doctest] Fix Blenderbot doctest #21297

Uh oh!

Conversation

younesbelkada commented Jan 25, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Jan 25, 2023

Uh oh!

sgugger commented Jan 25, 2023

Uh oh!

ydshieh commented Jan 25, 2023

Uh oh!

younesbelkada commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Doctest] Fix `Blenderbot` doctest #21297

[Doctest] Fix `Blenderbot` doctest #21297

HuggingFaceDocBuilderDev commented Jan 25, 2023 •

edited

Loading

younesbelkada commented Jan 25, 2023 •

edited

Loading

younesbelkada commented Jan 25, 2023 •

edited

Loading