Keras NLP Autogenerate presets table. #1219

Cyber-Machine · 2023-02-02T15:18:04Z

Solves #657 of keras-nlp.

@mattdangerw I have written code to generate preset tables in keras-io, and it is working fine on my local system while I am using the docker container, once my PR #690 gets merged it can autogenerate presets table on the main website.

To run locally, replace in requirements.txt

- keras-tuner
+ keras-tuner==1.1.3
- keras-cv==0.3.4
+ keras-cv==0.4.0
- keras-nlp

and in DockerFile replace

- RUN pip install git+https://github.com/keras-team/keras-nlp.git tensorflow --upgrade
+ RUN pip install git+https://github.com/Cyber-Machine/keras-nlp.git@preset_table tensorflow --upgrade

and run

docker build -t keras-io . && docker run --rm -p 8000:8000 keras-io

Here is a screenshot of my application on localhost

Will be waiting for your review.

mattdangerw

Thank you very much! This will be a big win for the library. Left some initial comments.

mattdangerw · 2023-02-03T02:20:27Z

scripts/autogen.py

+        n=Decimal(n)
+        return n.to_integral() if n == n.to_integral() else round(n.normalize(), decimal)
+
+    def numerize(n, decimal=2):


I think we could make this a lot simpler. We only need "K", "M" and "B" as suffixes practically. What about this?

def print_param_count(count): if count >= 1e9: return f"{int(count / 1e9)}B" if count >= 1e6: return f"{int(count / 1e6)}M" if count >= 1e3: return f"{int(count / 1e3)}K" return f"{count}"

I have made the param_count simple as suggested here.

mattdangerw · 2023-02-03T02:31:46Z

scripts/autogen.py

+        table += "-------|--------|-------|------\n"
+
+        presets = [ bert_presets,  distil_bert_presets, roberta_presets, xlm_roberta_presets]
+        links = ["[BERT](bert)", "[DistilBert](distil_bert)", "[RoBERTa](roberta)", "[XLM-RoBERTa](xlm_roberta)"]


Ah sorry I asked to take this out on the other PR, but I see now why it would be useful to have. What if we add this to the metadata, but not in a "markdown form." So basically...

"metadata": { "description": ..., "params": ..., "official_name": "XLM-RoBERTa", "path": "xlm_roberta" },

Then we could render that metadata here... f"[{official_name}]({path})". That way, all the "markdown stuff" stays in this repo. And all the "model metadata" stays in KerasNLP.

Since not all Backbone model has a path in them, I have rendered only those which contains the path in my PR #1222 .

mattdangerw · 2023-02-03T02:43:29Z

scripts/autogen.py

+    if "{{backbone_presets_table}}" in template:
+        # Import KerasNLP and do some stuff.
+
+        from keras_nlp.models.bert import bert_presets


We should avoid needing to keep a manually curated list like this. We can inspect our library to find the backbone and classifier presets we care about, but we want to avoid needing to update this when a model changes. This would need some adapting, but might help to get started...

# Print all backbone presets. for name, symbol in keras_nlp.models.__dict__.items(): if "Backbone" not in name: continue for preset in symbol.presets: print(preset) # Print all classifier presets. for name, symbol in keras_nlp.models.__dict__.items(): if "Classifier" not in name: continue for preset in symbol.presets: # Check if not a backbone preset. if not preset in symbol.backbone_cls.presets: print(preset)

+1, it should be autogenerated from a single source of truth

fchollet · 2023-02-04T03:29:14Z

requirements.txt

 tensorflow_datasets
 keras-tuner
-keras-cv
+keras-cv==0.3.4


Please revert this line

fchollet · 2023-02-04T03:29:57Z

scripts/autogen.py

                    "missing {{toc}} tag." % (template_path,)
                )
            template = template.replace("{{toc}}", toc)
+        if "keras_nlp/" in path_stack:


It's always going to be at a specific position right? You can refer to it by index

Yes, we can change

- if "keras_nlp/" in path_stack: + if "keras_nlp/models" in path_stack:

since it is required in that position only.

fchollet · 2023-02-04T03:30:24Z

scripts/autogen.py

+def render_keras_nlp_tags(template):
+    from decimal import Decimal
+
+    def round_num(n, decimal=2):


Could there be a simpler way?

It is made simple in PR #1222, I have integrated changes as suggested by @mattdangerw.

fchollet · 2023-02-04T03:31:17Z

scripts/autogen.py



+
+def render_keras_nlp_tags(template):


For the sake of clean factoring I recommend moving this to its own separate file

fchollet · 2023-02-04T03:31:35Z

scripts/autogen.py

+    if "{{backbone_presets_table}}" in template:
+        # Import KerasNLP and do some stuff.
+
+        from keras_nlp.models.bert import bert_presets


+1, it should be autogenerated from a single source of truth

mattdangerw self-requested a review February 3, 2023 01:42

mattdangerw requested changes Feb 3, 2023

View reviewed changes

fchollet reviewed Feb 4, 2023

View reviewed changes

Cyber-Machine closed this Feb 4, 2023

Cyber-Machine force-pushed the master branch from 3140706 to 43f9c74 Compare February 4, 2023 08:49

Cyber-Machine mentioned this pull request Feb 4, 2023

Keras NLP Autogenerate presets table. #1222

Merged

Keras NLP Autogenerate presets table. #1219

Keras NLP Autogenerate presets table. #1219

Uh oh!

Conversation

Cyber-Machine commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyber-Machine Feb 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattdangerw Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cyber-Machine commented Feb 2, 2023 •

edited

Loading

mattdangerw Feb 3, 2023 •

edited

Loading

Cyber-Machine Feb 4, 2023 •

edited

Loading

mattdangerw Feb 3, 2023 •

edited

Loading