Write a light-weight benchmarking script to quickly evaluate our models

The code should go into `keras_nlp/benchmarks`. 

We can use IMDB sentiment analysis task, guidance for which can be found [here](https://www.tensorflow.org/text/tutorials/text_classification_rnn).

One challenging point is we want this script to be able to evaluate all our `Classifier` models without writing custom code. Since for all models `Classifier` we have `Preprocessor`, and they have the unified name format `{model_name}Classifier`/`{model_name}Preprocessor`, e.g., `BertClassifier`/`BertPreprocessor`, we should be able to make the code reusable by having a flag `model_name`.

Here is the requirement in more details:

- ***example file name***: `keras_nlp/benchmarks/sentiment_analysis.py`
- ***example running command***: 
    ```
    python keras_nlp/benchmarks/sentiment_analysis.py \
        --model="bert" \
        --preset="bert_small_en_uncased" \
        --learning_rate=5e-5 \
        --num_epochs=5 \
        --batch_size=32
    ```
    flag `--model` specifies the model name, and `--preset` specifies the preset under testing. `--preset` could be None, while `--model` is required. Other flags are common training flags.
- ***output***: print out a few metrics, including
    - validation accuracy/F1 for each epoch.
    - testing accuracy/F1 after training is done.
    - total elapsed time (in seconds).    

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write a light-weight benchmarking script to quickly evaluate our models #634

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Write a light-weight benchmarking script to quickly evaluate our models #634

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions