-
Notifications
You must be signed in to change notification settings - Fork 31.6k
[benchmark tool] trainer-benchmark.py #14934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this. Even if for internal use, we can make a slight effort to make the code more readable so anyone can contribute to it easily :-)
It's going to be very useful to have it!
| sleep(0) | ||
| return dict( | ||
| {k: random.uniform(0, 100) for k in metric_keys}, | ||
| **{target_metric_key: random.choice([nan, 10.31, 100.2, 55.6666, 222.22222222])}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand where those numbers are coming from. Are they random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for debugging the output formatting - i.e. still perfecting the look-n-feel of the result reports, so if you want to tweak it these return immediately and with a few re-runs it'll give different outputs to validate that different numbers will be formatted well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note explain what is it for
| "--report-metric-keys", | ||
| default="", | ||
| type=str, | ||
| help="Report metric keys - other metric keys from output_dir/all_results.json to report, e.g., train_loss. Use a single argument e.g., 'train_loss train_samples", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it could be possible to also provide a link to where additional metric keys could be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have that information? it'd be different in different scripts, no?
That's why I documented output_dir/all_results.json so once you run it you will see all the available keys there for the given type of script.
| help="Base cmd", | ||
| ) | ||
| parser.add_argument( | ||
| "--variations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the variations just correspond to the precision? Maybe call it precision_variations then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.
Variations are just that - variations - it can be anything. --variations "A|B|C" "D|E|F" will do a Cartesian product to produce A D, A E, ..., C F and run each variation. If it's just --variations "A|B|C" it will just run A, then B, then C.
I was just using precisions as an example. but you can compare say --variations "--bs 4|--bs 6|--bs 8" (we don't have --bs, but you get the idea).
I'm open to suggestions if the name is not intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Sorry I misread it then. But then variations can only correspond to args of TrainingArguments no? So maybe we can say training_args_variations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that this tool is going to be expanded to support other programs which don't necessarily have training args, after all it doesn't care which args are varied. @siddk is planning to expand it to support accelerate and down the road others, hence the generic name.
So the mnemonic is "variations to compare" whatever they are.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Thanks for making this available to everybody. Didn't understand all the command line code in-detail, but looks like a very useful script to have
|
The documentation is not available anymore as the PR was closed or merged. |
This PR adds a benchmarking tool for HF Trainer args - e.g. compare --fp16 vs --bf16 performance, but can do that for multiple dimensions and it prints automatic tables suitable for instant pasting in Issues, including relative performance. e.g.:
samples
per
second
%
loss
This is produced by:
To add more dimensions simply add another
--variationsarg, e.g.:will lead to a Cartesian product of each arg with an outcome of:
samples
per
second
%
loss
See the doc at the beginning of the script for details. But it has lots of cool features like automatically reporting the hardware/software, preformatting everything for both copy-n-paste into the Issues/docs and also an easy to see console-only 2nd version at the end for when you debug things.
And more practical examples and reports for its use are here: #15026 and #14608
It should be relatively easy to adopt this tool to be used with
accelerateor with any other command line tool as long as there is a defined way to get to results, so each tool will need to have its own sub-class if we decided to extend it. I think @siddk suggested he might look into extending it.But let's see first that you like it and I placed it in a good location. The intention for it to be an internal tool, so I hope any
mapand such code will be tolerated.It's possible that we could use it for some basic regression testing. But it'd need to be further extended to support storing results and detecting regressions.
@LysandreJik, @patrickvonplaten, @sgugger, @patil-suraj