Skip to content

Add Doc Test for BERT#16523

Merged
ydshieh merged 14 commits intohuggingface:mainfrom
vumichien:doc-test-sprint
Apr 11, 2022
Merged

Add Doc Test for BERT#16523
ydshieh merged 14 commits intohuggingface:mainfrom
vumichien:doc-test-sprint

Conversation

@vumichien
Copy link
Contributor

What does this PR do?

Add doc tests for BERT, a part of issue #16292

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@patrickvonplaten, @ydshieh
Documentation: @sgugger

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 31, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh self-requested a review March 31, 2022 16:18
@ydshieh
Copy link
Collaborator

ydshieh commented Apr 1, 2022

Hi, @vumichien,

Thank you for this PR. I currently only have a quick look, and overall it is good 😊.

I saw that your use different checkpoints for PyTorch & TensorFlow BERT models in some cases.
I understand it very well, because some checkpoints only exist in PyTorch version.

After an internal discussion, although we think it is totally fine, we would prefer to use the same checkpoints for the PyTorch & TensorFlow models. Therefore, I will try to convert some PyTorch checkpoints you used in this PR, and upload them to Hugging Face Hub. Once this is done, we can change the checkpoints in this PR, and update the expected values (which should be just about copying the values from the PyTorch side).

I will keep you updated 😊!

@vumichien
Copy link
Contributor Author

vumichien commented Apr 1, 2022

@ydshieh Thank you very much. I am looking for your update.
Besides that, could you please take a look at modeling_mobilebert.py? When I run make fixup, it asks me to run make fix-copies . If I follow, the model checkpoint, expected_output, expected_loss,.. will be copied from bert to mobilebert, which leads to unwanted results.
I think it is due to these hashtags:

# Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering with Bert->MobileBert all-casing
class MobileBertForQuestionAnswering(MobileBertPreTrainedModel):
_keys_to_ignore_on_load_unexpected = [r"pooler"]

# Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice with Bert->MobileBert all-casing
class MobileBertForMultipleChoice(MobileBertPreTrainedModel):
def __init__(self, config):
super().__init__(config)

# Copied from transformers.models.bert.modeling_bert.BertForTokenClassification with Bert->MobileBert all-casing
class MobileBertForTokenClassification(MobileBertPreTrainedModel):
_keys_to_ignore_on_load_unexpected = [r"pooler"]

So what should I do to avoid this problem?

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 1, 2022

Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering with Bert->MobileBert all-casing

Hi, this is very tricky, and I need some discussion with the team.
This # Copied from thing is used to keep track the origin of some code blocks, and it's much better to keep it as many as possible. Thank you for pointing out this issue!

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 1, 2022

@vumichien

I uploaded the TensorFlow checkpoint here

For TFBertForSequenceClassification:

"ydshieh/bert-base-uncased-yelp-polarity"

For TFBertForQuestionAnswering:

"ydshieh/bert-base-cased-squad2"

Whenever you have time, could you change the correspondence places in modeling_tf_bert.py to use these checkpoint?
Ping me if you encounter any difficulty.

Thanks!

@vumichien
Copy link
Contributor Author

@ydshieh could you please check again your check point ydshieh/bert-base-cased-squad2 for TFBertForQuestionAnswering?Every time I run, the output is different (maybe the weight of head is still random)

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 4, 2022

@ydshieh could you please check again your check point ydshieh/bert-base-cased-squad2 for TFBertForQuestionAnswering?Every time I run, the output is different (maybe the weight of head is still random)

OK, I will take a look today.

@vumichien

I checked it, and it turns out that I loaded the PyTorch (QA) checkpoint into a TF model for another task type!
Sorry about this, I will upload the corrected TF checkpoint today, and keep you updated!

Uploaded the correct TF checkpoint 🙂 and tested it myself. The result now is always the same (and same as the PT result).

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 4, 2022

Hi, @vumichien ,

regarding fix-copies you mentioned in a previous comment: after some discussion, we think the best way is:

  • In Bert model file:

    • at the beginning part of the model file, set something like
      • EXPECTED_OUTPUT_FOR_TOKEN_CLASSIFICATION = ...
      • EXPECTED_LOSS_FOR_TOKEN_CLASSIFICATION = ...
    • In add_code_sample_docstrings forBertForTokenClassification, use
      • expected_output=EXPECTED_OUTPUT_FOR_TOKEN_CLASSIFICATION
      • expected_loss=EXPECTED_LOSS
  • In MobileBert model file:

    • at the beginning part of the model file, set something like
      • EXPECTED_OUTPUT_FOR_TOKEN_CLASSIFICATION = ... (set to empty string if you don't want to work on doctest for MobileBert)
      • EXPECTED_LOSS_FOR_TOKEN_CLASSIFICATION = ... (set to empty string if you don't want to work on doctest for MobileBert)
    • In add_code_sample_docstrings forMobileBertForTokenClassification, use
      • expected_output=EXPECTED_OUTPUT_FOR_TOKEN_CLASSIFICATION
      • expected_loss=EXPECTED_LOSS

And if you don't provide the values for MobileBert, just don't add it to documentation_tests.txt.

This should avoid the issue coming from copies. Let us know if you encounter any difficulty, thanks!

@vumichien
Copy link
Contributor Author

Thank you very much @ydshieh. I will update the docs test following your instructions

@vumichien
Copy link
Contributor Author

vumichien commented Apr 5, 2022

Hi @ydshieh

  • I have followed your instructions define-expected-output but the problem doesn't solve. Please help me check if I did something wrong.
  • In this PR, I think it's better if I do both bert and mobilebert at the same time but it still requires fix-copies.
  • I am still waiting for your update for The loss is always 0.0 (or nan) because the shape of input_ids is [1, 14]
    with QuestionAnswering task 🙂. It leads to the problem with MobileBertForQuestionAnswering and MobileBertForQuestionAnswering as well

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 5, 2022

Hi @vumichien Let me check, both the fix-copies & loss 0.0 things.

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 5, 2022

@vumichien

Regarding fix-copies, I probably should be more thorough in the previous comment.
One reason for the current PR version not working is:

In BertForSequenceClassification: you have

checkpoint="textattack/bert-base-uncased-yelp-polarity",

however, in MobileBertForSequenceClassification, you have

checkpoint="lordtt13/emo-mobilebert",

And this difference will be detected by make fix-copies.

In general, these situations could be solved using the same method:

  • set _CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION = ... at the beginning of the model file
  • set checkpoint=_CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION in xxxForSequenceClassification model.

The same approach applies to other models like xxxForQuestionAnswering.

Could you try it and let me know if you still have problem regarding this, please?

In the meantime, I am going to check the loss 0.0.

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 5, 2022

@patrickvonplaten

I think we need to update PT_QUESTION_ANSWERING_SAMPLE.

The indices (hard-coded) below

>>> target_start_index, target_end_index = torch.tensor([14]), torch.tensor([15])

actually depends on the different tokenizers. See the code snippet below for Albert v.s. Roberta.
Let me know your thought on this, thanks!

Code Snippet

from transformers import AlbertTokenizer, AlbertForQuestionAnswering, RobertaTokenizer
import torch

albert_checkpoint = "twmkn9/albert-base-v2-squad2"
roberta_checkpoint = "deepset/roberta-base-squad2"

albert_tokenizer = AlbertTokenizer.from_pretrained(f"{albert_checkpoint}")
roberta_tokenizer = RobertaTokenizer.from_pretrained(f"{roberta_checkpoint}")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

albert_input_ids = albert_tokenizer(question, text, return_tensors="pt").input_ids.numpy().tolist()[0]
roberta_input_ids = roberta_tokenizer(question, text, return_tensors="pt").input_ids.numpy().tolist()[0]

albert_decoded_tokens = albert_tokenizer.convert_ids_to_tokens(albert_input_ids)
roberta_decoded_tokens = roberta_tokenizer.convert_ids_to_tokens(roberta_input_ids)

# Albert
print(f"Albert: tokens = {albert_decoded_tokens}")
print(f"Albert: num_tokens = {len(albert_decoded_tokens)}")
print(f"Albert: position of `_nice`: {albert_decoded_tokens.index('▁nice')}\n")

# Roberta
print(f"Roberta: tokens = {roberta_decoded_tokens}")
print(f"Roberta: num_tokens = {len(roberta_decoded_tokens)}")
print(f"Roberta: position of `Ġnice`: {roberta_decoded_tokens.index('Ġnice')}")

Outputs

Albert: tokens = ['[CLS]', '▁who', '▁was', '▁jim', '▁henson', '?', '[SEP]', '▁jim', '▁henson', '▁was', '▁a', '▁nice', '▁puppet', '[SEP]']
Albert: num_tokens = 14
Albert: position of `_nice`: 11

Roberta: tokens = ['<s>', 'Who', 'Ġwas', 'ĠJim', 'ĠH', 'enson', '?', '</s>', '</s>', 'Jim', 'ĠH', 'enson', 'Ġwas', 'Ġa', 'Ġnice', 'Ġpuppet', '</s>']
Roberta: num_tokens = 17
Roberta: position of `Ġnice`: 14

@vumichien
Copy link
Contributor Author

vumichien commented Apr 5, 2022

@ydshieh Thank you very much for your clear explanation. Now I understand why we should do it to get over the problem with fix-copies 🙏

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 6, 2022

Regarding the issue mentioned here, one solution is to pass target_start_index and target_end_index to the sample defined in doc.py (we need to update the sample though to accept these).

I don't think there is a super good heuristic to determine these targets in the sample directly. Even with these 2 tokenizers, we already have '▁nice' v.s. 'Ġnice'.

Let's wait Patrick's response.

@patrickvonplaten
Copy link
Contributor

Hey @ydshieh and @vumichien,

IMO the best thing we can and should do here is to let the user pass the label idx.

@ydshieh ydshieh mentioned this pull request Apr 7, 2022
@ydshieh
Copy link
Collaborator

ydshieh commented Apr 8, 2022

@vumichien You can ignore the failed test run_tests_hub. I will check the PR later

@vumichien
Copy link
Contributor Author

@ydshieh Thank you very much

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @vumichien

Very high quality PR :-)
I left a few comments, and a few places to be addressed.
I think we are very close to merge the PR 💯

I haven't checked the part regarding mobilebert, but if you find my comments on bert also apply to mobilebert whenever possible, please address them too :-)

Thanks a lot.

checkpoint=_CHECKPOINT_FOR_DOC,
output_type=CausalLMOutputWithCrossAttentions,
config_class=_CONFIG_FOR_DOC,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super!

the model is configured as a decoder.
encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than super! Thanks for the fix

Comment on lines +1219 to +1223
>>> input_ids = tokenizer("Hello, my dog is cute", add_special_tokens=True, return_tensors="tf")
>>> # Batch size 1

>>> outputs = model(input_ids)
>>> prediction_scores, seq_relationship_scores = outputs[:2]
>>> prediction_logits, seq_relationship_logits = outputs[:2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you :D

Comment on lines 1307 to 1308
expected_output="'P a r i s'",
expected_loss=0.81,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to change _CHECKPOINT_FOR_DOC from "bert-base-cased" to "bert-base-uncased", so it matches the one defined and used in PyTorch Bert.

And the result should be the same in Bert: paris and 0.88.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the _CHECKPOINT_FOR_DOC as your suggestion

Comment on lines 1859 to 1863
checkpoint=_CHECKPOINT_FOR_TOKEN_CLASS,
output_type=TFTokenClassifierOutput,
config_class=_CONFIG_FOR_DOC,
expected_output="['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', 'I-LOC']",
expected_loss=0.01,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's not necessary here, but since you already defined

# TokenClassification docstring
_CHECKPOINT_FOR_TOKEN_CLASS = "dbmdz/bert-large-cased-finetuned-conll03-english"
_TOKEN_CLASS_EXPECTED_OUTPUT = (
    "['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', " "'I-LOC'] "
)
_TOKEN_CLASS_EXPECTED_LOSS = 0.01

let's just reuse those variables :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, I forgot to use them. Thank you for pointing them out ^^!

Comment on lines 1954 to 1955
expected_output="'a nice puppet'",
expected_loss=7.41,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reuse the variables you defined at the beginning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I totally agree

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR indeed, thanks a lot!

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vumichien, ready to merge: I suggested some final changes (you can commit directly - you can commit them in a batch)

  • Using _CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION instead of _CHECKPOINT_FOR_SEQ_CLASS, etc. (just style preference things, not big deal)

  • p a r i s -> paris: there is a bug in TF_MASKED_LM_SAMPLE, which is fixed in #16698. (No need to rebase or merge into this PR.)

@sgugger Could you have a final approval when you have sometime, thanks.
(just saw you approved!)

Thanks a lot for this PR, @vumichien . Let's merge it once the suggestions are committed!

vumichien and others added 4 commits April 11, 2022 21:41
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
@ydshieh
Copy link
Collaborator

ydshieh commented Apr 11, 2022

All good :-) I merge it now, thank you again ❤️ @vumichien

@ydshieh ydshieh merged commit 2831826 into huggingface:main Apr 11, 2022
@vumichien
Copy link
Contributor Author

@ydshieh You, too. Thanks a lot for helping with this PR 🙏

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* Add doctest BERT

* make fixup

* fix typo

* change checkpoints

* make fixup

* define doctest output value, update doctest for mobilebert

* solve fix-copies

* update QA target start index and end index

* change checkpoint for docs and reuse defined variable

* Update src/transformers/models/bert/modeling_tf_bert.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* make fixup

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants