Adding an AlbertMaskedLM task + Fix Projection layer dimension in MaskedLMHead #725

kanpuriyanawab · 2023-02-05T17:04:37Z

Closes #718
Fixes #733

Docstrings are yet to be added. Will add once doubts about implementation are clear.

kanpuriyanawab · 2023-02-05T17:12:38Z

cc: @jbischof @mattdangerw @chenmoneygithub @abheesht17

keras_nlp/models/albert/albert_masked_lm_preprocessor.py

kanpuriyanawab · 2023-02-07T20:31:12Z

Ready for review

mattdangerw

Thank you! This looks great!

Please go ahead and add docs, also there are some test failures it looks like (this model using different tokenization and padding than roberta, so the preprocessing tests will need some updates).

keras_nlp/models/albert/albert_masked_lm.py

kanpuriyanawab · 2023-02-08T19:07:16Z

CI is green, ready for review @mattdangerw @abheesht17 @jbischof

abheesht17

Quick review of the doc-strings.

keras_nlp/models/albert/albert_masked_lm.py

keras_nlp/models/albert/albert_masked_lm_preprocessor.py

mattdangerw

Thanks! Just a few quick comments

keras_nlp/layers/masked_lm_head.py

keras_nlp/models/albert/albert_masked_lm_preprocessor.py

kanpuriyanawab · 2023-02-09T20:29:44Z

@mattdangerw CI is green in terms of tests.
Not sure why cloud build fails in accelerator test

mattdangerw · 2023-02-09T21:17:00Z

@shivance thanks! will take another pass soon. looks like the accelerator test is just a timeout, so probably unrelated to the code changes here.

kanpuriyanawab · 2023-02-10T10:27:00Z

Good to go @mattdangerw

mattdangerw

Just a couple minor comments. I can fix these up as I merge.

keras_nlp/models/albert/albert_masked_lm_preprocessor_test.py

keras_nlp/layers/masked_lm_head_test.py

mattdangerw · 2023-02-10T20:49:59Z

keras_nlp/models/albert/albert_tokenizer.py

        sep_token = "[SEP]"
        pad_token = "<pad>"
+        mask_token = "<mask>"
        for token in [cls_token, sep_token, pad_token]:


Oh sorry missed one thing! We should update our tokenizer to check for the masked token here. Which will then mean we have to update a lot of testing for albert_tokenizer and albert_preprocessor so they actually include the mask token.

This PR shows how to add the mask in when training a sentencepiece model. #732

mattdangerw · 2023-02-11T00:36:52Z

keras_nlp/models/albert/albert_masked_lm_preprocessor_test.py

+        )
+
+    def test_preprocess_strings(self):
+        input_data = " airplane at airport"


One other follow up as you are adding the check for the mask token--we should also make sure the data you are using to train the sentencpiece model above matches the data you are passing here.

When all is working, the token id output should look something like [bos, mask, mask mask, eos, pad, pad pad] (replacing those symbols with the proper indices from the vocab).

kanpuriyanawab · 2023-02-13T15:38:02Z

Still WIP

mattdangerw · 2023-02-15T01:21:51Z

Still WIP

No rush! Ping when this is ready for 👀 again!

kanpuriyanawab · 2023-02-15T13:17:24Z

The PR is ready for review, waiting for the CI, although tests pass in local.

kanpuriyanawab · 2023-02-15T15:29:27Z

CI is green @mattdangerw @jbischof Ready for review !

mattdangerw

This looks great! Thank you!

Going to try out a quick training run with it, and if all looks good, will merge this in.

mattdangerw · 2023-02-16T17:19:46Z

keras_nlp/models/albert/albert_classifier_test.py

 # See the License for the specific language governing permissions and
 # limitations under the License.
-"""Tests for BERT classification model."""
+"""Tests for ALBERT classification model."""


Oops good catch :)

kanpuriyanawab · 2023-02-17T05:40:22Z

This looks great! Thank you!

Going to try out a quick training run with it, and if all looks good, will merge this in.

@mattdangerw Would you mind sharing the training script once done?

mattdangerw · 2023-02-17T19:48:03Z

@shivance sure! https://colab.research.google.com/gist/mattdangerw/c73a58e20132fd1117161a0f00b23b4b/albert-mlm.ipynb

Things look in the right ballpark to me. 43% guess the word accuracy after a single epoch on IMDb. So going to pull this in. Thanks again!

kanpuriyanawab · 2023-02-24T17:15:34Z

@mattdangerw @jbischof @chenmoneygithub I went through the GSoC ideas list

I found this one specially intriguing. I think this will add a lot of value to KerasNLP and Keras ecosystem in general. I referred huggingface tasks. Their piepline syntax is really useful.
Which of these task is KerasNLP planning to support?

Apart from supporting this task, in GSoC project I think accompanying tutorial will add a lot of value. I'm already working on #754 . I also opened #1253 in Keras-io, these are tutorials for Token Classification and Sentence Similarity respectively.

Which which of these task are in priority list and how many of them you suggest me to include in my proposal, (keeping timeline of GSoC in mind).

albert lm init commit

51fb5fb

kanpuriyanawab marked this pull request as draft February 5, 2023 17:05

kanpuriyanawab changed the title ~~Adding an AlbertMaskedLM task model~~ Adding an AlbertMaskedLM task model and preprocessor Feb 5, 2023

abheesht17 reviewed Feb 6, 2023

View reviewed changes

keras_nlp/models/albert/albert_masked_lm_preprocessor.py Show resolved Hide resolved

kanpuriyanawab marked this pull request as ready for review February 7, 2023 18:09

mattdangerw requested changes Feb 8, 2023

View reviewed changes

keras_nlp/models/albert/albert_masked_lm.py Outdated Show resolved Hide resolved

kanpuriyanawab added 2 commits February 8, 2023 22:02

fixing preprocessor tests

9eb6ff4

fixing the main model test + formatting + docstrings

7ee4bd6

kanpuriyanawab requested a review from abheesht17 February 8, 2023 18:09

kanpuriyanawab requested review from mattdangerw and removed request for abheesht17 February 8, 2023 19:07

abheesht17 reviewed Feb 9, 2023

View reviewed changes

fixing bug in masked lm head

6138b04

kanpuriyanawab mentioned this pull request Feb 9, 2023

Mistake in projection layer dimension of MaskedLMHead #733

Closed

fixing none condition in masked_lm_head_test

4dd31f7

mattdangerw reviewed Feb 9, 2023

View reviewed changes

keras_nlp/layers/masked_lm_head.py Outdated Show resolved Hide resolved

keras_nlp/models/albert/albert_masked_lm_preprocessor.py Outdated Show resolved Hide resolved

fixing formatting

ae25305

mattdangerw mentioned this pull request Feb 9, 2023

Solve #721 Deberta masklm model #732

Merged

fixing test_valid_call_with_embedding_weights

27be519

minor docstring changes

e7287b8

Minor fixes

5a036ef

mattdangerw approved these changes Feb 10, 2023

View reviewed changes

keras_nlp/models/albert/albert_masked_lm_preprocessor_test.py Outdated Show resolved Hide resolved

keras_nlp/layers/masked_lm_head_test.py Outdated Show resolved Hide resolved

mattdangerw requested changes Feb 10, 2023

View reviewed changes

mattdangerw reviewed Feb 11, 2023

View reviewed changes

addressing some comments

fb24d30

kanpuriyanawab and others added 5 commits February 15, 2023 17:52

working on fixing unit tests for masking

6755a20

working on fixing unit tests for masking

59f65b5

adding mask to preprocessor + fixing tests

d11971c

code format

4498ab8

Merge branch 'master' into alberta_lm

427b7d3

kanpuriyanawab added 2 commits February 15, 2023 19:42

fixing classifier test failures

9036cec

fixing formatting

a82350d

kanpuriyanawab requested a review from mattdangerw February 16, 2023 00:18

mattdangerw approved these changes Feb 16, 2023

View reviewed changes

mattdangerw merged commit 30cb703 into keras-team:master Feb 17, 2023

kanpuriyanawab mentioned this pull request Feb 20, 2023

Add a DistilBertMaskedLM task model #724

Merged

kanpuriyanawab deleted the alberta_lm branch March 10, 2023 06:49

kanpuriyanawab changed the title ~~Adding an AlbertMaskedLM task model and preprocessor~~ Adding an AlbertMaskedLM task + Fix Projection layer dimension in MaskedLMHead Mar 16, 2023

Adding an AlbertMaskedLM task + Fix Projection layer dimension in MaskedLMHead #725

Adding an AlbertMaskedLM task + Fix Projection layer dimension in MaskedLMHead #725

Uh oh!

Conversation

kanpuriyanawab commented Feb 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 7, 2023

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 9, 2023

Uh oh!

mattdangerw commented Feb 9, 2023

Uh oh!

kanpuriyanawab commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattdangerw Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 11, 2023

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab commented Feb 13, 2023

Uh oh!

mattdangerw commented Feb 15, 2023

Uh oh!

kanpuriyanawab commented Feb 15, 2023

Uh oh!

kanpuriyanawab commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 16, 2023

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab commented Feb 17, 2023

Uh oh!

mattdangerw commented Feb 17, 2023

Uh oh!

kanpuriyanawab commented Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

kanpuriyanawab commented Feb 5, 2023 •

edited

Loading

kanpuriyanawab commented Feb 5, 2023 •

edited

Loading

kanpuriyanawab commented Feb 8, 2023 •

edited

Loading

kanpuriyanawab commented Feb 10, 2023 •

edited

Loading

kanpuriyanawab commented Feb 15, 2023 •

edited

Loading

kanpuriyanawab commented Feb 24, 2023 •

edited

Loading