-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reformat benchmark metrics #42
Conversation
@JoeZijunZhou Is the key request_inthroughput a typo? should it be |
Yes, plz change it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Shall we also update the README in /benchmarks to give an example of running benchmark with eval? Also resolve the python checks before merge: https://github.com/google/JetStream/actions/runs/8742895378/job/23992198244?pr=42. Thanks!
@@ -735,6 +748,12 @@ def main(args: argparse.Namespace): | |||
default="/tmp/request-outputs.json", | |||
help="File path to store request outputs", | |||
) | |||
parser.add_argument( | |||
"--run-eval", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this change to readme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint check failed, could you fix the check error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Agree, need to update our documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much needed change. Makes it easy to run perf and eval in one command.
metrics_json = {**metrics_json, **benchmark_result} | ||
if args.run_eval: | ||
eval_json = eval_accuracy(output) | ||
metrics_json = {**metrics_json, **eval_json} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It maybe work keeping metrics separated into "perf" and "eval" categories. What is the interface to XLML storage -- are you flattening all metrics into one table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics
json is a single-level json. During json-lines processing, it uses the assumption that the key is a string and value is a float (not another json).
It populates BigQuery tables in normalized tables. I'll show you what I mean via chat.
Also, see example in this test https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/xlml/utils/metric_test.py#L155
As part of effort the to automate benchmarks in XLML, the benchmark_serving.py script will be used to benchmark MaxText JetModels. The benchmark results, however, need to be in a slightly different format:
This PR refactors the output, casts the values, and returns it for later use.