Add a DistilBertMaskedLM task model #724

ADITYADAS1999 · 2023-02-05T14:39:09Z

Add a DistilBertMaskedLMPreprocessor preprocessor layer and tests.
Add a DistilBertMaskedLM task model and tests.

mattdangerw

Thank you! Left some initial comments, also looks like there are some test failures that need investigation.

mattdangerw · 2023-02-07T23:41:44Z

keras_nlp/models/distil_bert/distil_bert_masked_lm_preprocessor.py

+
+    This preprocessing layer will prepare inputs for a masked language modeling
+    task. It is primarily intended for use with the
+    `keras_nlp.models.DistilBertMaskedLM` task model. Preprocessing will occur in


fix line length

mattdangerw · 2023-02-07T23:45:05Z

keras_nlp/models/distil_bert/distil_bert_masked_lm_preprocessor.py

+            masked.
+        mask_selection_length: The maximum number of masked tokens supported
+            by the layer.
+        mask_token_rate: float, defaults to 0.8. `mask_token_rate` must be


This is prexising I know, but let's remove these defaults for this arg and below (we don't do this for other args in this list)

mattdangerw · 2023-02-07T23:48:05Z

keras_nlp/models/distil_bert/distil_bert_masked_lm_preprocessor.py

+    multiple steps.
+
+    - Tokenize any number of input segments using the `tokenizer`.
+    - Pack the inputs together with the appropriate `"<s>"`, `"</s>"` and


These are not the special tokens used by DistilBert, this will need updating.

mattdangerw · 2023-02-07T23:50:25Z

keras_nlp/models/distil_bert/distil_bert_masked_lm_preprocessor.py

+
+    # Alternatively, you can create a preprocessor from your own vocabulary.
+    # The usage is exactly the same as above.
+    vocab = {"<s>": 0, "<pad>": 1, "</s>": 2, "<mask>": 3}


this is not the right format for the vocabulary for this model, will need updating.

mattdangerw · 2023-02-07T23:51:17Z

keras_nlp/models/distil_bert/distil_bert_masked_lm.py

+    Disclaimer: Pre-trained models are provided on an "as is" basis, without
+    warranties or conditions of any kind. The underlying model is provided by a
+    third party and subject to a separate license, available
+    [here](https://github.com/facebookresearch/fairseq).


update this to the right disclaimer for the model, distilbert is huggingface

ADITYADAS1999 · 2023-02-08T03:14:03Z

Thank you! Left some initial comments, also looks like there are some test failures that need investigation.

thanks @mattdangerw for review, I make the changes soon

ADITYADAS1999 · 2023-02-08T04:43:38Z

Thank you! Left some initial comments, also looks like there are some test failures that need investigation.

thanks @mattdangerw for review, I make all the changes and run test locally.

mattdangerw · 2023-02-09T19:19:09Z

@ADITYADAS1999 thanks! Look like there are still some test failure to fix in the preprocessor layer you are adding from your screen shot! You can also see the failures in the automatic testing running on this PR.

ADITYADAS1999 · 2023-02-10T02:52:32Z

@ADITYADAS1999 thanks! Look like there are still some test failure to fix in the preprocessor layer you are adding from your screen shot! You can also see the failures in the automatic testing running on this PR.

yes , I working on this....

This reverts commit 555ba33.

ADITYADAS1999 · 2023-02-12T14:12:12Z

hey @mattdangerw all the tests are checked , can you review for necessary changes.

kanpuriyanawab · 2023-02-20T19:02:48Z

@mattdangerw I just left some comments, as peer review. I hope peer review is allowed at KerasNLP ! I would request you to take a look as well Before @ADITYADAS1999 addresses them.

ADITYADAS1999 · 2023-02-21T02:50:59Z

@mattdangerw I just left some comments, as peer review. I hope peer review is allowed at KerasNLP ! I would request you to take a look as well Before @ADITYADAS1999 addresses them.

hey @shivance is there a issue in PR ? can you explain little bit.

mattdangerw · 2023-02-23T01:37:28Z

@shivance you are welcome to leave comments if you spot things, but I don't actually see anything from you here. Did you mean to post something to this PR?

keras_nlp/models/distil_bert/distil_bert_tokenizer.py

kanpuriyanawab · 2023-02-20T18:56:48Z

keras_nlp/models/distil_bert/distil_bert_masked_lm_preprocessor_test.py

+        ("keras_format", "keras_v3", "model.keras"),
+    )
+    def test_saved_model(self, save_format, filename):
+        input_data = tf.constant([" airplane at airport"])


While the tests still pass, ideally we should use the sentences with words from vocabulary.
Here as the words are not from vocabulary upon tokenization it should all be [UNK] token.

kanpuriyanawab · 2023-02-20T18:58:01Z

keras_nlp/models/distil_bert/distil_bert_masked_lm.py

+    labels = [[3, 5]] * 2
+
+    # Randomly initialize a DistilBERT encoder
+    backbone = keras_nlp.models.DistilBertBackbone(


Just a reminder, make sure that the examples actually run correctly. Thanks !

kanpuriyanawab · 2023-02-23T01:44:21Z

@ADITYADAS1999 @mattdangerw Could you check now, if my comments are visible?

ADITYADAS1999 · 2023-02-23T03:08:45Z

@ADITYADAS1999 @mattdangerw Could you check now, if my comments are visible?

yes @shivance it's visible for me.

@mattdangerw is it visible for you ?

mattdangerw · 2023-02-23T04:11:03Z

Yes all visible now! With github I often "Start a review" but forgot to finish, if you don't finish the review your comments will not post.

kanpuriyanawab · 2023-02-23T04:19:59Z

Yes all visible now! With github I often "Start a review" but forgot to finish, if you don't finish the review your comments will not post.

Yes Yes @mattdangerw , That's what happened 😅. Reviewed a PR for the first time.

Also could you review comments? If they are valid.

mattdangerw

Thank you! This looks good!

There is one comment re the saved model test from @shivance that is worth doing.

I'm running a quick training job to make sure things look ok, can merge when that is done.

keras_nlp/models/distil_bert/distil_bert_tokenizer.py

mattdangerw · 2023-02-24T01:32:05Z

Thanks pulling this in now! Here's the trial run -> https://gist.github.com/mattdangerw/b16c257973762a0b4ab9a34f6a932cc1

ADITYADAS1999 · 2023-02-24T03:09:56Z

Thank you @mattdangerw 🎉

ADITYADAS1999 · 2023-02-24T17:06:24Z

Hey @mattdangerw @jbischof I want to participate gsoc this year, can you guide or advice in project idea, proposals docs in kerasNLP ? 👨‍💻

ADITYADAS1999 added 10 commits February 5, 2023 20:06

Add files via upload

3a8aca4

Update __init__.py

dcc36c5

Update distil_bert_tokenizer.py

1f1d6ef

Delete distil_bert_masked_lm_preprocessor_test.py

bf203cb

Add files via upload

88ce3c5

Delete distil_bert_masked_lm.py

0772c02

Delete distil_bert_masked_lm_test.py

25b15e8

Delete distil_bert_masked_lm_preprocessor.py

4b8d44c

Delete distil_bert_masked_lm_preprocessor_test.py

a960f08

Add files via upload

f419bd4

mattdangerw requested changes Feb 7, 2023

View reviewed changes

ADITYADAS1999 added 2 commits February 9, 2023 17:32

request changes

eb5b3e8

request changes

ccd68c4

ADITYADAS1999 and others added 8 commits February 11, 2023 23:05

Delete distil_bert_masked_lm_preprocessor_test.py

1b62669

Add files via upload

8a337f5

Merge branch 'keras-team:master' into second_new_branch

e361786

Update distil_bert_tokenizer_test.py

555ba33

Revert "Update distil_bert_tokenizer_test.py"

e852563

This reverts commit 555ba33.

Update distil_bert_tokenizer.py

969e88d

Update distil_bert_tokenizer.py

82db577

Update distil_bert_tokenizer.py

0f8fc58

ADITYADAS1999 requested a review from mattdangerw February 16, 2023 04:55

ADITYADAS1999 mentioned this pull request Feb 16, 2023

Add a DistilBertMaskedLM task model #710

Closed

4 tasks

kanpuriyanawab reviewed Feb 23, 2023

View reviewed changes

mattdangerw approved these changes Feb 23, 2023

View reviewed changes

keras_nlp/models/distil_bert/distil_bert_tokenizer.py Show resolved Hide resolved

Minor fix

e0e97d9

mattdangerw merged commit d12facb into keras-team:master Feb 24, 2023

Add a DistilBertMaskedLM task model #724

Add a DistilBertMaskedLM task model #724

Uh oh!

Conversation

ADITYADAS1999 commented Feb 5, 2023

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

ADITYADAS1999 commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ADITYADAS1999 commented Feb 8, 2023

Uh oh!

mattdangerw commented Feb 9, 2023

Uh oh!

ADITYADAS1999 commented Feb 10, 2023

Uh oh!

ADITYADAS1999 commented Feb 12, 2023

Uh oh!

kanpuriyanawab commented Feb 20, 2023

Uh oh!

ADITYADAS1999 commented Feb 21, 2023

Uh oh!

mattdangerw commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kanpuriyanawab Feb 20, 2023

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab Feb 20, 2023

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ADITYADAS1999 commented Feb 23, 2023

Uh oh!

mattdangerw commented Feb 23, 2023

Uh oh!

kanpuriyanawab commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattdangerw commented Feb 24, 2023

Uh oh!

ADITYADAS1999 commented Feb 24, 2023

Uh oh!

ADITYADAS1999 commented Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ADITYADAS1999 commented Feb 8, 2023 •

edited

Loading

mattdangerw commented Feb 23, 2023 •

edited

Loading

kanpuriyanawab commented Feb 23, 2023 •

edited

Loading

kanpuriyanawab commented Feb 23, 2023 •

edited

Loading

ADITYADAS1999 commented Feb 24, 2023 •

edited

Loading