New run glue script #7917

sgugger · 2020-10-19T21:19:12Z

What does this PR do?

This PR cleans up the run_glue.py script to use the Datasets library. Along the way it adds a few fixes in Trainer. The script supports all glue tasks as well as custom user tasks (passed along with a training and validation file in csv or json format). It has been tested on the following setups:

single GPU
multi-GPU with DataParallel
multi-GPU with DistributedDataParallel
TPU

The README has been updated to reflect the changes, there is just one breaking change from before which is that data_dir is not an accepted argument anymore (since Datasets will take care of downloading the data files).

sgugger · 2020-10-19T21:20:49Z

src/transformers/trainer.py

        if self.control.should_evaluate:
            metrics = self.evaluate()
            self._report_to_hp_search(trial, epoch, metrics)
-            self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, metrics)


Moving this to the end of evaluate othewise that even is not called when we call trainer.evaluate() independently.

LysandreJik

This is a fantastic change, it makes this example's code so much easier to read imo.

LysandreJik · 2020-10-20T12:30:45Z

examples/text-classification/run_glue.py

-    glue_compute_metrics,
-    glue_output_modes,
-    glue_tasks_num_labels,


This is very nice

LysandreJik · 2020-10-20T12:36:43Z

examples/text-classification/run_glue.py

+    if is_main_process(training_args.local_rank):
+        logging.set_verbosity_info()
+    logger.info(f"Training/evaluation parameters {training_args}")


Have you confirmed this actually works? It seems to me that you're setting the default verbosity level of the root logger (so the loggers of transformers and every file contained in it), but the logger of the current file isn't a child of this logger (it's in examples/, not in src/transformers/)so it doesn't look like it'll be impacted by that change.

I would argue you would still need to change the current logger's default verbosity to info if you want to see the line logger.info(f"Training/evaluation parameters {training_args}") being printed.

It does work and I can see all the info being printed on my screen.

Ok, this should be working now.

thomwolf

Looks really cool!

A few user experience comments :)

thomwolf · 2020-10-20T13:04:28Z

examples/text-classification/run_glue.py

+        "sentence2_key": sentence2_key,
+        "max_length": data_args.max_seq_length,
+    }
+    datasets = datasets.map(preprocess_function, batched=True, fn_kwargs=encode_kwargs)


I like better to have preprocess_function as a closure written just here with the arguments instead of defining it above with kwargs.

It spares the reader a scroll up and down to see what's happening but I understand this is a matter of personal taste.

We lose the preprocessing caching for some reason when doing that.

thomwolf · 2020-10-20T13:05:13Z

examples/text-classification/run_glue.py

+    # Get the metric function
+    metric = load_metric("glue", data_args.task_name)
+
+    def compute_metrics(p: EvalPrediction):


Add a comment for the reader

thomwolf · 2020-10-20T13:06:11Z

examples/text-classification/run_glue.py

-            return glue_compute_metrics(task_name, preds, p.label_ids)
-
-        return compute_metrics_fn
+    datasets = load_dataset("glue", data_args.task_name)


Add a detailed multi-line comment explaining how the user can also easily load his own datasets as a JSON or CSV files (with mock examples) and linking to the relevant page of the datasets library.

Done, I stayed basic on the examples since there is the link to the datasets documentation.

thomwolf · 2020-10-20T13:07:53Z

examples/text-classification/run_glue.py

+    test_dataset = datasets["test_matched" if data_args.task_name == "mnli" else "test"]
+
+    # Get the metric function
+    metric = load_metric("glue", data_args.task_name)


Here we should also think about a user who would like to train on his own classification CSV dataset.

I think we should probably have a few "f1", "accuracy" metrics in datasets for such use cases. What do you think @LysandreJik @lhoestq @sgugger ?

I think all basic metrics provided by scikit-learn should be available in datasets, yes.

thomwolf · 2020-10-20T13:08:32Z

examples/text-classification/run_glue.py

+
+    train_dataset = datasets["train"]
+    eval_dataset = datasets["validation_matched" if data_args.task_name == "mnli" else "validation"]
+    test_dataset = datasets["test_matched" if data_args.task_name == "mnli" else "test"]


logger.info() a few dataset samples?

src/transformers/trainer_utils.py

julien-c

nice job

Co-authored-by: Julien Chaumond <[email protected]>

julien-c · 2020-10-21T13:34:35Z

Should we start thinking about automating the creation of the metadata block for the model's model card?

here for instance we'd already have this info:

---
datasets:
- mrpc
metrics:
- f1
finetuned_from: bert-base-cased
---

sgugger · 2020-10-21T13:39:05Z

We could think of something like that and add a blank model card to be completed by the user in the final checkpoint. We could also include the results of the last evaluation if there is one.

LysandreJik

I think this is great. LGTM!

thomwolf

Look great, added a few proposals to make it a bit simpler to read (imo)

examples/text-classification/run_glue.py

thomwolf · 2020-10-22T13:36:51Z

examples/text-classification/run_glue.py

+    # Set seed before initializing model.
    set_seed(training_args.seed)


Note that we also have a set_seed method in the datasets library.

examples/text-classification/run_glue.py

Co-authored-by: Thomas Wolf <[email protected]>

sgugger added 3 commits October 19, 2020 15:03

Start simplification

44cd337

More progress

273846c

Finished script

b08274a

sgugger requested review from LysandreJik, julien-c and thomwolf October 19, 2020 21:19

sgugger commented Oct 19, 2020

View reviewed changes

LysandreJik approved these changes Oct 20, 2020

View reviewed changes

thomwolf reviewed Oct 20, 2020

View reviewed changes

sgugger added 3 commits October 20, 2020 12:50

Address comments and update tests instructions

2ab8614

Wrong test

254549e

Accept files as inputs and fix test

27b9ddd

julien-c reviewed Oct 21, 2020

View reviewed changes

src/transformers/trainer_utils.py Outdated Show resolved Hide resolved

julien-c approved these changes Oct 21, 2020

View reviewed changes

Update src/transformers/trainer_utils.py

d030d0b

Co-authored-by: Julien Chaumond <[email protected]>

sgugger added 5 commits October 21, 2020 10:48

Fix labels and add combined score

2cff968

Add special labels

310a1d9

Update TPU command

022a3ad

Revert to old label strategy

598f17a

Use model labels

c573d9c

sgugger changed the title ~~[WIP] New run glue script~~ New run glue script Oct 21, 2020

sgugger added 2 commits October 21, 2020 17:44

Fix for STT-B

978ef16

Styling

1241cc2

LysandreJik approved these changes Oct 22, 2020

View reviewed changes

Merge branch 'master' into new_run_glue

ea722e2

sgugger mentioned this pull request Oct 22, 2020

xla_spawn and run_language_modeling slow on TPUs #7962

Closed

4 tasks

thomwolf approved these changes Oct 22, 2020

View reviewed changes

Apply suggestions from code review

ea1d875

Co-authored-by: Thomas Wolf <[email protected]>

sgugger added 2 commits October 22, 2020 11:21

Code styling

d75c0ea

Fix review comments

1032f9b

sgugger merged commit 2e5052d into master Oct 22, 2020

sgugger deleted the new_run_glue branch October 22, 2020 15:42

LysandreJik mentioned this pull request Oct 28, 2020

New run_clm script #8105

Merged

		# Set seed before initializing model.
		set_seed(training_args.seed)

New run glue script #7917

New run glue script #7917

Uh oh!

Conversation

sgugger commented Oct 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomwolf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

julien-c commented Oct 21, 2020

Uh oh!

sgugger commented Oct 21, 2020

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

thomwolf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sgugger commented Oct 19, 2020 •

edited

Loading