Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored benchmark tests #196

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

shimizust
Copy link
Collaborator

@shimizust shimizust commented Sep 3, 2024

Summary

  • Refactored kernel benchmarking tests to save data to a single CSV (benchmark/data/all_benchmark_data.csv) that contains more complete information on how the benchmarking test was setup:
kernel_name (e.g. swiglu)
kernel_provider (e.g. liger or huggingface)
kernel_operation_mode (e.g. full, forward, backward)
metric_name (e.g. speed, memory)
metric_unit (e.g. MB)
x_name
x_label (e.g. hidden size)
x_value
y_value_50 (median)
y_value_20 (20th percentile)
y_value_80 (80th percnetile)
extra_benchmark_config_str (e.g. {""B"": 32, ""T"": 512, ""D"": 768, ""dtype"": ""torch.float32""})
gpu_name (e.g. NVIDIA A100-SXM4-80GB)
timestamp
liger_version (e.g. 0.2.1)
  • Removed existing image files to not clutter up the git history per previous discussion (Added H100 benchmarks #54)
  • Added sample jupyter notebook that can be used to filter and plot the benchmark data
  • Added makefile command to run all benchmarks: make run-benchmarks
  • Don't overwrite existing data by default if same benchmark. Same benchmark means same kernel_name, kernel_provider, kernel_operation_mode, metric_name, x_label, gpu_name, and extra_benchmark_config_str

Testing Done

  • Ran all benchmarks and spot-checked values with existing benchmark data

image

  • Hardware Type: A100
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Copy link
Collaborator

@ByronHsu ByronHsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a readme in benchmark/ to elaborate how to run these?

add'l stuff

Not working

Somewhat working

Working benchmarking script

Fixed script for overwriting values

Updated swiglu

Updated benchmark_rope

Updated benchmark_rms_norm

Updated flce

Updated geglu

Updated embedding

Updated layernorm

working notebook

Cleared outputs

Reran for latest liger version, switched to quantiles

Fixe checkstyle

Added instructions

fixed typo
@shimizust
Copy link
Collaborator Author

can we add a readme in benchmark/ to elaborate how to run these?

Added instructions to contributing.md

Copy link
Collaborator

@ByronHsu ByronHsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Follow up task can be making decimal of speed benchmark configurable. Triton imple is hard coded. cc @austin362667

@ByronHsu ByronHsu merged commit b5672a1 into linkedin:main Sep 4, 2024
2 checks passed
@shimizust
Copy link
Collaborator Author

Taking the raw results from triton.testing.do_bench here, we already get the unrounded values. I think the For example you can see in the CSV:

cross_entropy,liger,forward,speed,ms,V,vocab size,8192,0.8101439476013184,0.7565760016441345,0.9144319891929626,"{""B"": 8, ""T"": 2048}",NVIDIA A100-SXM4-80GB,2024-09-03 15:31:39,0.2.1

cc @ByronHsu @austin362667

@ByronHsu
Copy link
Collaborator

ByronHsu commented Sep 4, 2024

ah i see! you are writing the csv on your own

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants