-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Keras NLP Autogenerate presets table. #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mattdangerw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! This will be a big win for the library. Left some initial comments.
scripts/autogen.py
Outdated
| n=Decimal(n) | ||
| return n.to_integral() if n == n.to_integral() else round(n.normalize(), decimal) | ||
|
|
||
| def numerize(n, decimal=2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make this a lot simpler. We only need "K", "M" and "B" as suffixes practically. What about this?
def print_param_count(count):
if count >= 1e9:
return f"{int(count / 1e9)}B"
if count >= 1e6:
return f"{int(count / 1e6)}M"
if count >= 1e3:
return f"{int(count / 1e3)}K"
return f"{count}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the param_count simple as suggested here.
scripts/autogen.py
Outdated
| table += "-------|--------|-------|------\n" | ||
|
|
||
| presets = [ bert_presets, distil_bert_presets, roberta_presets, xlm_roberta_presets] | ||
| links = ["[BERT](bert)", "[DistilBert](distil_bert)", "[RoBERTa](roberta)", "[XLM-RoBERTa](xlm_roberta)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I asked to take this out on the other PR, but I see now why it would be useful to have. What if we add this to the metadata, but not in a "markdown form." So basically...
"metadata": {
"description": ...,
"params": ...,
"official_name": "XLM-RoBERTa",
"path": "xlm_roberta"
},
Then we could render that metadata here... f"[{official_name}]({path})". That way, all the "markdown stuff" stays in this repo. And all the "model metadata" stays in KerasNLP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since not all Backbone model has a path in them, I have rendered only those which contains the path in my PR #1222 .
scripts/autogen.py
Outdated
| if "{{backbone_presets_table}}" in template: | ||
| # Import KerasNLP and do some stuff. | ||
|
|
||
| from keras_nlp.models.bert import bert_presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid needing to keep a manually curated list like this. We can inspect our library to find the backbone and classifier presets we care about, but we want to avoid needing to update this when a model changes. This would need some adapting, but might help to get started...
# Print all backbone presets.
for name, symbol in keras_nlp.models.__dict__.items():
if "Backbone" not in name:
continue
for preset in symbol.presets:
print(preset)
# Print all classifier presets.
for name, symbol in keras_nlp.models.__dict__.items():
if "Classifier" not in name:
continue
for preset in symbol.presets:
# Check if not a backbone preset.
if not preset in symbol.backbone_cls.presets:
print(preset)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it should be autogenerated from a single source of truth
requirements.txt
Outdated
| tensorflow_datasets | ||
| keras-tuner | ||
| keras-cv | ||
| keras-cv==0.3.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this line
scripts/autogen.py
Outdated
| "missing {{toc}} tag." % (template_path,) | ||
| ) | ||
| template = template.replace("{{toc}}", toc) | ||
| if "keras_nlp/" in path_stack: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always going to be at a specific position right? You can refer to it by index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can change
- if "keras_nlp/" in path_stack:
+ if "keras_nlp/models" in path_stack:since it is required in that position only.
scripts/autogen.py
Outdated
| def render_keras_nlp_tags(template): | ||
| from decimal import Decimal | ||
|
|
||
| def round_num(n, decimal=2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a simpler way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is made simple in PR #1222, I have integrated changes as suggested by @mattdangerw.
scripts/autogen.py
Outdated
|
|
||
|
|
||
|
|
||
| def render_keras_nlp_tags(template): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of clean factoring I recommend moving this to its own separate file
scripts/autogen.py
Outdated
| if "{{backbone_presets_table}}" in template: | ||
| # Import KerasNLP and do some stuff. | ||
|
|
||
| from keras_nlp.models.bert import bert_presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it should be autogenerated from a single source of truth
Solves #657 of keras-nlp.
@mattdangerw I have written code to generate preset tables in keras-io, and it is working fine on my local system while I am using the docker container, once my PR #690 gets merged it can autogenerate presets table on the main website.
To run locally, replace in
requirements.txtand in
DockerFilereplaceand run
Here is a screenshot of my application on localhost
Will be waiting for your review.