[benchmark tool] trainer-benchmark.py #14934

stas00 · 2021-12-27T01:35:59Z

This PR adds a benchmarking tool for HF Trainer args - e.g. compare --fp16 vs --bf16 performance, but can do that for multiple dimensions and it prints automatic tables suitable for instant pasting in Issues, including relative performance. e.g.:

Variation	Train samples per second	Diff %	Train loss
--per_device_train_batch_size 1	7.77	0	1.90
--per_device_train_batch_size 2	15.51	100	2.01
--per_device_train_batch_size 4	29.66	282	2.09
--per_device_train_batch_size 8	61.16	687	2.16
--per_device_train_batch_size 16	115.84	1392	2.25
--per_device_train_batch_size 32	224.96	2797	2.38

This is produced by:


CUDA_VISIBLE_DEVICES=0 python ./scripts/benchmark/trainer-benchmark.py \
--base-cmd \
' examples/pytorch/translation/run_translation.py --model_name_or_path t5-base \
--output_dir output_dir --do_train --label_smoothing 0.1 --logging_strategy no \
--save_strategy no --max_source_length 512 \
--max_target_length 512 --num_train_epochs 1 --overwrite_output_dir \
--source_lang en --target_lang ro --dataset_name wmt16 --dataset_config "ro-en" \
--source_prefix "translate English to Romanian: " --warmup_steps 50 \
--max_train_samples 5000 --dataloader_num_workers 2 --bf16' \
--target-metric-key train_samples_per_second --repeat-times 1 --variations \
'--per_device_train_batch_size 1|--per_device_train_batch_size 2|--per_device_train_batch_size 4|--per_device_train_batch_size 8|--per_device_train_batch_size 16|--per_device_train_batch_size 32' \
--report-metric-keys train_loss --repeat-times 1

To add more dimensions simply add another --variations arg, e.g.:

--variations '|--fp16|--bf16' '--tf32 0|--tf32 1'

will lead to a Cartesian product of each arg with an outcome of:

Variation	Train samples per second	Diff %	Train loss
--tf32 0	272.59	0	2.49
--tf32 1	581.61	113	2.49
--fp16 --tf32 0	643.07	136	2.49
--fp16 --tf32 1	635.24	133	2.49
--bf16 --tf32 0	616.23	126	2.50
--bf16 --tf32 1	612.59	125	2.50

See the doc at the beginning of the script for details. But it has lots of cool features like automatically reporting the hardware/software, preformatting everything for both copy-n-paste into the Issues/docs and also an easy to see console-only 2nd version at the end for when you debug things.

And more practical examples and reports for its use are here: #15026 and #14608

It should be relatively easy to adopt this tool to be used with accelerate or with any other command line tool as long as there is a defined way to get to results, so each tool will need to have its own sub-class if we decided to extend it. I think @siddk suggested he might look into extending it.

But let's see first that you like it and I placed it in a good location. The intention for it to be an internal tool, so I hope any map and such code will be tolerated.

It's possible that we could use it for some basic regression testing. But it'd need to be further extended to support storing results and detecting regressions.

@LysandreJik, @patrickvonplaten, @sgugger, @patil-suraj

sgugger

Thanks a lot for working on this. Even if for internal use, we can make a slight effort to make the code more readable so anyone can contribute to it easily :-)
It's going to be very useful to have it!

scripts/benchmark/trainer-benchmark.py

patrickvonplaten · 2022-02-15T14:15:05Z

scripts/benchmark/trainer-benchmark.py

+        sleep(0)
+        return dict(
+            {k: random.uniform(0, 100) for k in metric_keys},
+            **{target_metric_key: random.choice([nan, 10.31, 100.2, 55.6666, 222.22222222])},


I don't really understand where those numbers are coming from. Are they random?

This is for debugging the output formatting - i.e. still perfecting the look-n-feel of the result reports, so if you want to tweak it these return immediately and with a few re-runs it'll give different outputs to validate that different numbers will be formatted well.

I added a note explain what is it for

patrickvonplaten · 2022-02-15T14:16:57Z

scripts/benchmark/trainer-benchmark.py

+        "--report-metric-keys",
+        default="",
+        type=str,
+        help="Report metric keys - other metric keys from output_dir/all_results.json to report, e.g., train_loss. Use a single argument e.g., 'train_loss train_samples",


Do you think it could be possible to also provide a link to where additional metric keys could be found?

do we have that information? it'd be different in different scripts, no?

That's why I documented output_dir/all_results.json so once you run it you will see all the available keys there for the given type of script.

patrickvonplaten · 2022-02-15T14:18:42Z

scripts/benchmark/trainer-benchmark.py

+        help="Base cmd",
+    )
+    parser.add_argument(
+        "--variations",


do the variations just correspond to the precision? Maybe call it precision_variations then?

no.

Variations are just that - variations - it can be anything. --variations "A|B|C" "D|E|F" will do a Cartesian product to produce A D, A E, ..., C F and run each variation. If it's just --variations "A|B|C" it will just run A, then B, then C.

I was just using precisions as an example. but you can compare say --variations "--bs 4|--bs 6|--bs 8" (we don't have --bs, but you get the idea).

I'm open to suggestions if the name is not intuitive.

I see! Sorry I misread it then. But then variations can only correspond to args of TrainingArguments no? So maybe we can say training_args_variations?

The idea is that this tool is going to be expanded to support other programs which don't necessarily have training args, after all it doesn't care which args are varied. @siddk is planning to expand it to support accelerate and down the road others, hence the generic name.

So the mnemonic is "variations to compare" whatever they are.

patrickvonplaten

Very cool! Thanks for making this available to everybody. Didn't understand all the command line code in-detail, but looks like a very useful script to have

HuggingFaceDocBuilderDev · 2022-04-05T17:10:04Z

The documentation is not available anymore as the PR was closed or merged.

[benchmark tool] trainer-benchmark.py

d1bd5ab

stas00 mentioned this pull request Dec 27, 2021

[Benchmark] HF Trainer on RTX-3090 #14608

Open

stas00 added 4 commits December 27, 2021 11:36

improve

d1c9ad2

massive rework/expansion

51929f0

fix

ed80129

mucho improved

0325a54

stas00 mentioned this pull request Dec 29, 2021

Deprecates AdamW and adds --optim #14744

Merged

5 tasks

stas00 added 4 commits December 29, 2021 17:22

improved

1c0f69f

fix prefix

bde5de4

Merge remote-tracking branch 'origin/master' into trainer-benchmark

e1e9576

fix

968d86f

stas00 mentioned this pull request Jan 4, 2022

[Benchmark] HF Trainer on A100 #15026

Open

fix diff calculation

b9e2a12

stas00 mentioned this pull request Jan 15, 2022

[trainer] new in pytorch: torch.optim._multi_tensor faster optimizers #9965

Open

stas00 marked this pull request as ready for review January 15, 2022 04:31

stas00 changed the title ~~[WIP] [benchmark tool] trainer-benchmark.py~~ [benchmark tool] trainer-benchmark.py Jan 15, 2022

sgugger approved these changes Jan 17, 2022

View reviewed changes

huggingface deleted a comment from github-actions bot Feb 11, 2022

stas00 added Performance WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress labels Feb 11, 2022

patrickvonplaten reviewed Feb 15, 2022

View reviewed changes

patrickvonplaten approved these changes Feb 15, 2022

View reviewed changes

stas00 added 3 commits February 15, 2022 17:20

address suggestions

6a875ea

Merge remote-tracking branch 'origin/master' into trainer-benchmark

d7c0931

Merge remote-tracking branch 'origin/main' into trainer-benchmark

5012204

stas00 merged commit 23fc4cb into huggingface:main Apr 5, 2022

stas00 deleted the trainer-benchmark branch April 5, 2022 17:27

[benchmark tool] trainer-benchmark.py #14934

[benchmark tool] trainer-benchmark.py #14934

Uh oh!

Conversation

stas00 commented Dec 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Feb 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stas00 commented Dec 27, 2021 •

edited

Loading

stas00 Feb 15, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 5, 2022 •

edited

Loading