Refactored benchmark tests #196

shimizust · 2024-09-03T16:06:35Z

Summary

Refactored kernel benchmarking tests to save data to a single CSV (benchmark/data/all_benchmark_data.csv) that contains more complete information on how the benchmarking test was setup:

kernel_name (e.g. swiglu)
kernel_provider (e.g. liger or huggingface)
kernel_operation_mode (e.g. full, forward, backward)
metric_name (e.g. speed, memory)
metric_unit (e.g. MB)
x_name
x_label (e.g. hidden size)
x_value
y_value_50 (median)
y_value_20 (20th percentile)
y_value_80 (80th percnetile)
extra_benchmark_config_str (e.g. {""B"": 32, ""T"": 512, ""D"": 768, ""dtype"": ""torch.float32""})
gpu_name (e.g. NVIDIA A100-SXM4-80GB)
timestamp
liger_version (e.g. 0.2.1)

Removed existing image files to not clutter up the git history per previous discussion (Added H100 benchmarks #54)
Added sample jupyter notebook that can be used to filter and plot the benchmark data
Added makefile command to run all benchmarks: make run-benchmarks
Don't overwrite existing data by default if same benchmark. Same benchmark means same kernel_name, kernel_provider, kernel_operation_mode, metric_name, x_label, gpu_name, and extra_benchmark_config_str

Testing Done

Ran all benchmarks and spot-checked values with existing benchmark data

Hardware Type: A100
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

ByronHsu

can we add a readme in benchmark/ to elaborate how to run these?

add'l stuff Not working Somewhat working Working benchmarking script Fixed script for overwriting values Updated swiglu Updated benchmark_rope Updated benchmark_rms_norm Updated flce Updated geglu Updated embedding Updated layernorm working notebook Cleared outputs Reran for latest liger version, switched to quantiles Fixe checkstyle Added instructions fixed typo

shimizust · 2024-09-04T01:10:07Z

can we add a readme in benchmark/ to elaborate how to run these?

Added instructions to contributing.md

ByronHsu

This is awesome! Follow up task can be making decimal of speed benchmark configurable. Triton imple is hard coded. cc @austin362667

shimizust · 2024-09-04T17:34:36Z

Taking the raw results from triton.testing.do_bench here, we already get the unrounded values. I think the For example you can see in the CSV:

cross_entropy,liger,forward,speed,ms,V,vocab size,8192,0.8101439476013184,0.7565760016441345,0.9144319891929626,"{""B"": 8, ""T"": 2048}",NVIDIA A100-SXM4-80GB,2024-09-03 15:31:39,0.2.1

cc @ByronHsu @austin362667

ByronHsu · 2024-09-04T17:58:42Z

ah i see! you are writing the csv on your own

lancerts requested a review from yundai424 September 3, 2024 16:44

ByronHsu reviewed Sep 3, 2024

View reviewed changes

shimizust force-pushed the sshimizu/benchmarks-refactor branch from 79ee25b to a1a56a8 Compare September 4, 2024 01:09

ByronHsu approved these changes Sep 4, 2024

View reviewed changes

ByronHsu merged commit b5672a1 into linkedin:main Sep 4, 2024
2 checks passed

austin362667 mentioned this pull request Sep 4, 2024

Add --decimal flag to benchmarking #209

Closed

4 tasks

shimizust mentioned this pull request Sep 4, 2024

Remove benchmark png/html and only store csv #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored benchmark tests #196

Refactored benchmark tests #196

shimizust commented Sep 3, 2024 •

edited

Loading

ByronHsu left a comment

shimizust commented Sep 4, 2024

ByronHsu left a comment

shimizust commented Sep 4, 2024

ByronHsu commented Sep 4, 2024

Refactored benchmark tests #196

Refactored benchmark tests #196

Conversation

shimizust commented Sep 3, 2024 • edited Loading

Summary

Testing Done

ByronHsu left a comment

Choose a reason for hiding this comment

shimizust commented Sep 4, 2024

ByronHsu left a comment

Choose a reason for hiding this comment

shimizust commented Sep 4, 2024

ByronHsu commented Sep 4, 2024

shimizust commented Sep 3, 2024 •

edited

Loading