Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
Hi, @vumichien, Thank you for this PR. I currently only have a quick look, and overall it is good 😊. I saw that your use different checkpoints for PyTorch & TensorFlow BERT models in some cases. After an internal discussion, although we think it is totally fine, we would prefer to use the same checkpoints for the PyTorch & TensorFlow models. Therefore, I will try to convert some PyTorch checkpoints you used in this PR, and upload them to Hugging Face Hub. Once this is done, we can change the checkpoints in this PR, and update the expected values (which should be just about copying the values from the PyTorch side). I will keep you updated 😊! |
|
@ydshieh Thank you very much. I am looking for your update. transformers/src/transformers/models/mobilebert/modeling_mobilebert.py Lines 1315 to 1318 in 9de70f2 transformers/src/transformers/models/mobilebert/modeling_mobilebert.py Lines 1419 to 1422 in 9de70f2 transformers/src/transformers/models/mobilebert/modeling_mobilebert.py Lines 1517 to 1520 in 9de70f2 So what should I do to avoid this problem? |
Hi, this is very tricky, and I need some discussion with the team. |
|
I uploaded the TensorFlow checkpoint here For For Whenever you have time, could you change the correspondence places in Thanks! |
|
@ydshieh could you please check again your check point |
I checked it, and it turns out that I loaded the PyTorch (QA) checkpoint into a TF model for another task type! Uploaded the correct TF checkpoint 🙂 and tested it myself. The result now is always the same (and same as the PT result). |
|
Hi, @vumichien , regarding fix-copies you mentioned in a previous comment: after some discussion, we think the best way is:
And if you don't provide the values for This should avoid the issue coming from |
|
Thank you very much @ydshieh. I will update the docs test following your instructions |
|
Hi @ydshieh
|
|
Hi @vumichien Let me check, both the fix-copies & loss 0.0 things. |
|
Regarding In checkpoint="textattack/bert-base-uncased-yelp-polarity",however, in checkpoint="lordtt13/emo-mobilebert",And this difference will be detected by In general, these situations could be solved using the same method:
The same approach applies to other models like Could you try it and let me know if you still have problem regarding this, please? In the meantime, I am going to check the loss 0.0. |
|
I think we need to update The indices (hard-coded) below >>> target_start_index, target_end_index = torch.tensor([14]), torch.tensor([15])actually depends on the different tokenizers. See the code snippet below for Code Snippetfrom transformers import AlbertTokenizer, AlbertForQuestionAnswering, RobertaTokenizer
import torch
albert_checkpoint = "twmkn9/albert-base-v2-squad2"
roberta_checkpoint = "deepset/roberta-base-squad2"
albert_tokenizer = AlbertTokenizer.from_pretrained(f"{albert_checkpoint}")
roberta_tokenizer = RobertaTokenizer.from_pretrained(f"{roberta_checkpoint}")
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
albert_input_ids = albert_tokenizer(question, text, return_tensors="pt").input_ids.numpy().tolist()[0]
roberta_input_ids = roberta_tokenizer(question, text, return_tensors="pt").input_ids.numpy().tolist()[0]
albert_decoded_tokens = albert_tokenizer.convert_ids_to_tokens(albert_input_ids)
roberta_decoded_tokens = roberta_tokenizer.convert_ids_to_tokens(roberta_input_ids)
# Albert
print(f"Albert: tokens = {albert_decoded_tokens}")
print(f"Albert: num_tokens = {len(albert_decoded_tokens)}")
print(f"Albert: position of `_nice`: {albert_decoded_tokens.index('▁nice')}\n")
# Roberta
print(f"Roberta: tokens = {roberta_decoded_tokens}")
print(f"Roberta: num_tokens = {len(roberta_decoded_tokens)}")
print(f"Roberta: position of `Ġnice`: {roberta_decoded_tokens.index('Ġnice')}")Outputs |
|
@ydshieh Thank you very much for your clear explanation. Now I understand why we should do it to get over the problem with |
|
Regarding the issue mentioned here, one solution is to pass I don't think there is a super good heuristic to determine these targets in the sample directly. Even with these 2 tokenizers, we already have Let's wait Patrick's response. |
|
Hey @ydshieh and @vumichien, IMO the best thing we can and should do here is to let the user pass the label idx. |
|
@vumichien You can ignore the failed test |
|
@ydshieh Thank you very much |
ydshieh
left a comment
There was a problem hiding this comment.
Hi, @vumichien
Very high quality PR :-)
I left a few comments, and a few places to be addressed.
I think we are very close to merge the PR 💯
I haven't checked the part regarding mobilebert, but if you find my comments on bert also apply to mobilebert whenever possible, please address them too :-)
Thanks a lot.
| checkpoint=_CHECKPOINT_FOR_DOC, | ||
| output_type=CausalLMOutputWithCrossAttentions, | ||
| config_class=_CONFIG_FOR_DOC, | ||
| ) |
| the model is configured as a decoder. | ||
| encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*): | ||
| Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in | ||
| the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`: |
There was a problem hiding this comment.
More than super! Thanks for the fix
| >>> input_ids = tokenizer("Hello, my dog is cute", add_special_tokens=True, return_tensors="tf") | ||
| >>> # Batch size 1 | ||
|
|
||
| >>> outputs = model(input_ids) | ||
| >>> prediction_scores, seq_relationship_scores = outputs[:2] | ||
| >>> prediction_logits, seq_relationship_logits = outputs[:2] |
| expected_output="'P a r i s'", | ||
| expected_loss=0.81, |
There was a problem hiding this comment.
I think it's better to change _CHECKPOINT_FOR_DOC from "bert-base-cased" to "bert-base-uncased", so it matches the one defined and used in PyTorch Bert.
And the result should be the same in Bert: paris and 0.88.
There was a problem hiding this comment.
I have changed the _CHECKPOINT_FOR_DOC as your suggestion
| checkpoint=_CHECKPOINT_FOR_TOKEN_CLASS, | ||
| output_type=TFTokenClassifierOutput, | ||
| config_class=_CONFIG_FOR_DOC, | ||
| expected_output="['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', 'I-LOC']", | ||
| expected_loss=0.01, |
There was a problem hiding this comment.
Probably it's not necessary here, but since you already defined
# TokenClassification docstring
_CHECKPOINT_FOR_TOKEN_CLASS = "dbmdz/bert-large-cased-finetuned-conll03-english"
_TOKEN_CLASS_EXPECTED_OUTPUT = (
"['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', " "'I-LOC'] "
)
_TOKEN_CLASS_EXPECTED_LOSS = 0.01
let's just reuse those variables :-)
There was a problem hiding this comment.
OMG, I forgot to use them. Thank you for pointing them out ^^!
| expected_output="'a nice puppet'", | ||
| expected_loss=7.41, |
There was a problem hiding this comment.
We can reuse the variables you defined at the beginning
There was a problem hiding this comment.
Yeah, I totally agree
sgugger
left a comment
There was a problem hiding this comment.
Very nice PR indeed, thanks a lot!
There was a problem hiding this comment.
Hi @vumichien, ready to merge: I suggested some final changes (you can commit directly - you can commit them in a batch)
-
Using
_CHECKPOINT_FOR_SEQUENCE_CLASSIFICATIONinstead of_CHECKPOINT_FOR_SEQ_CLASS, etc. (just style preference things, not big deal) -
p a r i s->paris: there is a bug inTF_MASKED_LM_SAMPLE, which is fixed in #16698. (No need to rebase or merge into this PR.)
@sgugger Could you have a final approval when you have sometime, thanks.
(just saw you approved!)
Thanks a lot for this PR, @vumichien . Let's merge it once the suggestions are committed!
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
|
All good :-) I merge it now, thank you again ❤️ @vumichien |
|
@ydshieh You, too. Thanks a lot for helping with this PR 🙏 |
* Add doctest BERT * make fixup * fix typo * change checkpoints * make fixup * define doctest output value, update doctest for mobilebert * solve fix-copies * update QA target start index and end index * change checkpoint for docs and reuse defined variable * Update src/transformers/models/bert/modeling_tf_bert.py Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * make fixup Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
What does this PR do?
Add doc tests for BERT, a part of issue #16292
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@patrickvonplaten, @ydshieh
Documentation: @sgugger