Skip to content

Conversation

@Cyber-Machine
Copy link
Contributor

@Cyber-Machine Cyber-Machine commented Feb 2, 2023

Solves #657 of keras-nlp.

@mattdangerw I have written code to generate preset tables in keras-io, and it is working fine on my local system while I am using the docker container, once my PR #690 gets merged it can autogenerate presets table on the main website.

To run locally, replace in requirements.txt

- keras-tuner
+ keras-tuner==1.1.3
- keras-cv==0.3.4
+ keras-cv==0.4.0
- keras-nlp

and in DockerFile replace

- RUN pip install git+https://github.com/keras-team/keras-nlp.git tensorflow --upgrade
+ RUN pip install git+https://github.com/Cyber-Machine/keras-nlp.git@preset_table tensorflow --upgrade

and run

docker build -t keras-io . && docker run --rm -p 8000:8000 keras-io

Here is a screenshot of my application on localhost

image

image

Will be waiting for your review.

@mattdangerw mattdangerw self-requested a review February 3, 2023 01:42
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! This will be a big win for the library. Left some initial comments.

n=Decimal(n)
return n.to_integral() if n == n.to_integral() else round(n.normalize(), decimal)

def numerize(n, decimal=2):
Copy link
Member

@mattdangerw mattdangerw Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make this a lot simpler. We only need "K", "M" and "B" as suffixes practically. What about this?

def print_param_count(count):
    if count >= 1e9:
        return f"{int(count / 1e9)}B"
    if count >= 1e6:
        return f"{int(count / 1e6)}M"
    if count >= 1e3:
        return f"{int(count / 1e3)}K"
    return f"{count}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the param_count simple as suggested here.

table += "-------|--------|-------|------\n"

presets = [ bert_presets, distil_bert_presets, roberta_presets, xlm_roberta_presets]
links = ["[BERT](bert)", "[DistilBert](distil_bert)", "[RoBERTa](roberta)", "[XLM-RoBERTa](xlm_roberta)"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I asked to take this out on the other PR, but I see now why it would be useful to have. What if we add this to the metadata, but not in a "markdown form." So basically...

"metadata": {
    "description": ...,
    "params": ...,
    "official_name": "XLM-RoBERTa",
    "path": "xlm_roberta"
},

Then we could render that metadata here... f"[{official_name}]({path})". That way, all the "markdown stuff" stays in this repo. And all the "model metadata" stays in KerasNLP.

Copy link
Contributor Author

@Cyber-Machine Cyber-Machine Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since not all Backbone model has a path in them, I have rendered only those which contains the path in my PR #1222 .

if "{{backbone_presets_table}}" in template:
# Import KerasNLP and do some stuff.

from keras_nlp.models.bert import bert_presets
Copy link
Member

@mattdangerw mattdangerw Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid needing to keep a manually curated list like this. We can inspect our library to find the backbone and classifier presets we care about, but we want to avoid needing to update this when a model changes. This would need some adapting, but might help to get started...

# Print all backbone presets.
for name, symbol in keras_nlp.models.__dict__.items():
    if "Backbone" not in name:
        continue
    for preset in symbol.presets:
        print(preset)

# Print all classifier presets.
for name, symbol in keras_nlp.models.__dict__.items():
    if "Classifier" not in name:
        continue
    for preset in symbol.presets:
        # Check if not a backbone preset.
        if not preset in symbol.backbone_cls.presets:
            print(preset)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it should be autogenerated from a single source of truth

requirements.txt Outdated
tensorflow_datasets
keras-tuner
keras-cv
keras-cv==0.3.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this line

"missing {{toc}} tag." % (template_path,)
)
template = template.replace("{{toc}}", toc)
if "keras_nlp/" in path_stack:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always going to be at a specific position right? You can refer to it by index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can change

-  if "keras_nlp/" in path_stack:
+ if "keras_nlp/models" in path_stack:

since it is required in that position only.

def render_keras_nlp_tags(template):
from decimal import Decimal

def round_num(n, decimal=2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a simpler way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is made simple in PR #1222, I have integrated changes as suggested by @mattdangerw.




def render_keras_nlp_tags(template):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of clean factoring I recommend moving this to its own separate file

if "{{backbone_presets_table}}" in template:
# Import KerasNLP and do some stuff.

from keras_nlp.models.bert import bert_presets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it should be autogenerated from a single source of truth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants