Skip to content

Conversation

@stas00
Copy link
Contributor

@stas00 stas00 commented Dec 27, 2021

This PR adds a benchmarking tool for HF Trainer args - e.g. compare --fp16 vs --bf16 performance, but can do that for multiple dimensions and it prints automatic tables suitable for instant pasting in Issues, including relative performance. e.g.:

Variation Train
samples
per
second
Diff
%
Train
loss
--per_device_train_batch_size 1 7.77 0 1.90
--per_device_train_batch_size 2 15.51 100 2.01
--per_device_train_batch_size 4 29.66 282 2.09
--per_device_train_batch_size 8 61.16 687 2.16
--per_device_train_batch_size 16 115.84 1392 2.25
--per_device_train_batch_size 32 224.96 2797 2.38

This is produced by:


CUDA_VISIBLE_DEVICES=0 python ./scripts/benchmark/trainer-benchmark.py \
--base-cmd \
' examples/pytorch/translation/run_translation.py --model_name_or_path t5-base \
--output_dir output_dir --do_train --label_smoothing 0.1 --logging_strategy no \
--save_strategy no --max_source_length 512 \
--max_target_length 512 --num_train_epochs 1 --overwrite_output_dir \
--source_lang en --target_lang ro --dataset_name wmt16 --dataset_config "ro-en" \
--source_prefix "translate English to Romanian: " --warmup_steps 50 \
--max_train_samples 5000 --dataloader_num_workers 2 --bf16' \
--target-metric-key train_samples_per_second --repeat-times 1 --variations \
'--per_device_train_batch_size 1|--per_device_train_batch_size 2|--per_device_train_batch_size 4|--per_device_train_batch_size 8|--per_device_train_batch_size 16|--per_device_train_batch_size 32' \
--report-metric-keys train_loss --repeat-times 1 

To add more dimensions simply add another --variations arg, e.g.:

--variations '|--fp16|--bf16' '--tf32 0|--tf32 1' 

will lead to a Cartesian product of each arg with an outcome of:

Variation Train
samples
per
second
Diff
%
Train
loss
--tf32 0 272.59 0 2.49
--tf32 1 581.61 113 2.49
--fp16 --tf32 0 643.07 136 2.49
--fp16 --tf32 1 635.24 133 2.49
--bf16 --tf32 0 616.23 126 2.50
--bf16 --tf32 1 612.59 125 2.50

See the doc at the beginning of the script for details. But it has lots of cool features like automatically reporting the hardware/software, preformatting everything for both copy-n-paste into the Issues/docs and also an easy to see console-only 2nd version at the end for when you debug things.

And more practical examples and reports for its use are here: #15026 and #14608

It should be relatively easy to adopt this tool to be used with accelerate or with any other command line tool as long as there is a defined way to get to results, so each tool will need to have its own sub-class if we decided to extend it. I think @siddk suggested he might look into extending it.

But let's see first that you like it and I placed it in a good location. The intention for it to be an internal tool, so I hope any map and such code will be tolerated.

It's possible that we could use it for some basic regression testing. But it'd need to be further extended to support storing results and detecting regressions.

@LysandreJik, @patrickvonplaten, @sgugger, @patil-suraj

@stas00 stas00 mentioned this pull request Dec 29, 2021
5 tasks
@stas00 stas00 marked this pull request as ready for review January 15, 2022 04:31
@stas00 stas00 changed the title [WIP] [benchmark tool] trainer-benchmark.py [benchmark tool] trainer-benchmark.py Jan 15, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this. Even if for internal use, we can make a slight effort to make the code more readable so anyone can contribute to it easily :-)
It's going to be very useful to have it!

@huggingface huggingface deleted a comment from github-actions bot Feb 11, 2022
@stas00 stas00 added Performance WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress labels Feb 11, 2022
sleep(0)
return dict(
{k: random.uniform(0, 100) for k in metric_keys},
**{target_metric_key: random.choice([nan, 10.31, 100.2, 55.6666, 222.22222222])},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand where those numbers are coming from. Are they random?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for debugging the output formatting - i.e. still perfecting the look-n-feel of the result reports, so if you want to tweak it these return immediately and with a few re-runs it'll give different outputs to validate that different numbers will be formatted well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note explain what is it for

"--report-metric-keys",
default="",
type=str,
help="Report metric keys - other metric keys from output_dir/all_results.json to report, e.g., train_loss. Use a single argument e.g., 'train_loss train_samples",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it could be possible to also provide a link to where additional metric keys could be found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have that information? it'd be different in different scripts, no?

That's why I documented output_dir/all_results.json so once you run it you will see all the available keys there for the given type of script.

help="Base cmd",
)
parser.add_argument(
"--variations",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the variations just correspond to the precision? Maybe call it precision_variations then?

Copy link
Contributor Author

@stas00 stas00 Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no.

Variations are just that - variations - it can be anything. --variations "A|B|C" "D|E|F" will do a Cartesian product to produce A D, A E, ..., C F and run each variation. If it's just --variations "A|B|C" it will just run A, then B, then C.

I was just using precisions as an example. but you can compare say --variations "--bs 4|--bs 6|--bs 8" (we don't have --bs, but you get the idea).

I'm open to suggestions if the name is not intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Sorry I misread it then. But then variations can only correspond to args of TrainingArguments no? So maybe we can say training_args_variations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this tool is going to be expanded to support other programs which don't necessarily have training args, after all it doesn't care which args are varied. @siddk is planning to expand it to support accelerate and down the road others, hence the generic name.

So the mnemonic is "variations to compare" whatever they are.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Thanks for making this available to everybody. Didn't understand all the command line code in-detail, but looks like a very useful script to have

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 5, 2022

The documentation is not available anymore as the PR was closed or merged.

@stas00 stas00 merged commit 23fc4cb into huggingface:main Apr 5, 2022
@stas00 stas00 deleted the trainer-benchmark branch April 5, 2022 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants