Reformat benchmark metrics #42

yeandy · 2024-04-18T18:25:37Z

As part of effort the to automate benchmarks in XLML, the benchmark_serving.py script will be used to benchmark MaxText JetModels. The benchmark results, however, need to be in a slightly different format:

{
  "metrics": {"step_time": 100},
  "dimension": {"framework": "jax", ...}
}

This PR refactors the output, casts the values, and returns it for later use.

yeandy · 2024-04-18T18:27:22Z

@JoeZijunZhou Is the key request_inthroughput a typo? should it be request_throughput?

JoeZijunZhou · 2024-04-18T18:57:43Z

@JoeZijunZhou Is the key request_inthroughput a typo? should it be request_throughput?

Yes, plz change it to request_throughput, thanks!

JoeZijunZhou

LGTM! Shall we also update the README in /benchmarks to give an example of running benchmark with eval? Also resolve the python checks before merge: https://github.com/google/JetStream/actions/runs/8742895378/job/23992198244?pr=42. Thanks!

FanhaiLu1 · 2024-04-18T19:32:09Z

benchmarks/benchmark_serving.py

@@ -735,6 +748,12 @@ def main(args: argparse.Namespace):
      default="/tmp/request-outputs.json",
      help="File path to store request outputs",
  )
+  parser.add_argument(
+      "--run-eval",


Can you add this change to readme?

pylint check failed, could you fix the check error?

vipannalla · 2024-04-18T20:20:02Z

LGTM! Shall we also update the README in /benchmarks to give an example of running benchmark with eval? Also resolve the python checks before merge: https://github.com/google/JetStream/actions/runs/8742895378/job/23992198244?pr=42. Thanks!

Agree, need to update our documentation.

vipannalla

This is a much needed change. Makes it easy to run perf and eval in one command.

vipannalla · 2024-04-18T20:33:10Z

benchmarks/benchmark_serving.py

+    metrics_json = {**metrics_json, **benchmark_result}
+    if args.run_eval:
+      eval_json = eval_accuracy(output)
+      metrics_json = {**metrics_json, **eval_json}


It maybe work keeping metrics separated into "perf" and "eval" categories. What is the interface to XLML storage -- are you flattening all metrics into one table?

The metrics json is a single-level json. During json-lines processing, it uses the assumption that the key is a string and value is a float (not another json).

It populates BigQuery tables in normalized tables. I'll show you what I mean via chat.

Also, see example in this test https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/xlml/utils/metric_test.py#L155

yeandy added 6 commits April 17, 2024 18:01

Initial

02fa91e

debug

8a5b66b

debug

bc1de13

cast

850fd82

Clean up

88df361

Remove test.py

287cb40

yeandy requested review from JoeZijunZhou and FanhaiLu1 April 18, 2024 18:25

yeandy requested a review from vipannalla as a code owner April 18, 2024 18:25

JoeZijunZhou approved these changes Apr 18, 2024

View reviewed changes

FanhaiLu1 approved these changes Apr 18, 2024

View reviewed changes

vipannalla reviewed Apr 18, 2024

View reviewed changes

Address comments

0936983

yeandy merged commit 1b1d93f into main Apr 19, 2024
3 checks passed

yeandy deleted the consolidate_benchmark_results branch April 19, 2024 15:22

jwyang-google pushed a commit that referenced this pull request May 6, 2024

Reformat benchmark metrics (#42)

957028a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reformat benchmark metrics #42

Reformat benchmark metrics #42

yeandy commented Apr 18, 2024

yeandy commented Apr 18, 2024

JoeZijunZhou commented Apr 18, 2024

JoeZijunZhou left a comment •

edited

Loading

FanhaiLu1 Apr 18, 2024

FanhaiLu1 Apr 18, 2024

yeandy Apr 19, 2024

vipannalla commented Apr 18, 2024 •

edited

Loading

vipannalla left a comment

vipannalla Apr 18, 2024

yeandy Apr 18, 2024

Reformat benchmark metrics #42

Reformat benchmark metrics #42

Conversation

yeandy commented Apr 18, 2024

yeandy commented Apr 18, 2024

JoeZijunZhou commented Apr 18, 2024

JoeZijunZhou left a comment • edited Loading

Choose a reason for hiding this comment

FanhaiLu1 Apr 18, 2024

Choose a reason for hiding this comment

FanhaiLu1 Apr 18, 2024

Choose a reason for hiding this comment

yeandy Apr 19, 2024

Choose a reason for hiding this comment

vipannalla commented Apr 18, 2024 • edited Loading

vipannalla left a comment

Choose a reason for hiding this comment

vipannalla Apr 18, 2024

Choose a reason for hiding this comment

yeandy Apr 18, 2024

Choose a reason for hiding this comment

JoeZijunZhou left a comment •

edited

Loading

vipannalla commented Apr 18, 2024 •

edited

Loading