Skip to content

Write a light-weight benchmarking script to quickly evaluate our models #634

@chenmoneygithub

Description

@chenmoneygithub

The code should go into keras_nlp/benchmarks.

We can use IMDB sentiment analysis task, guidance for which can be found here.

One challenging point is we want this script to be able to evaluate all our Classifier models without writing custom code. Since for all models Classifier we have Preprocessor, and they have the unified name format {model_name}Classifier/{model_name}Preprocessor, e.g., BertClassifier/BertPreprocessor, we should be able to make the code reusable by having a flag model_name.

Here is the requirement in more details:

  • example file name: keras_nlp/benchmarks/sentiment_analysis.py
  • example running command:
    python keras_nlp/benchmarks/sentiment_analysis.py \
        --model="bert" \
        --preset="bert_small_en_uncased" \
        --learning_rate=5e-5 \
        --num_epochs=5 \
        --batch_size=32
    
    flag --model specifies the model name, and --preset specifies the preset under testing. --preset could be None, while --model is required. Other flags are common training flags.
  • output: print out a few metrics, including
    • validation accuracy/F1 for each epoch.
    • testing accuracy/F1 after training is done.
    • total elapsed time (in seconds).

Metadata

Metadata

Assignees

No one assigned

    Labels

    stat:contributions welcomeAdd this label to feature request issues so they are separated out from bug reporting issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions