Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segment-anything-fast perf/acc benchmarks to torchao #457

Merged
merged 13 commits into from
Jul 2, 2024
Merged

Conversation

jcaip
Copy link
Contributor

@jcaip jcaip commented Jun 28, 2024

This PR adds in segment-anything-fast evaluation to torchao, and also adds benchmarks for int8 quantization + 2:4 sparsity.

With this we can run combined perf/accuracy benchmarks for segment-anything. This should give us a starting point for the relative perf vs relative acc graph for PTC.

Model Type Technique img/s memory (MiB) mIoU relative speedup relative accuracy
ViT-h baseline (bfloat16, max-autotune) 22.75 15172 0.5811
int8 dynamic quant (attn + mlp) 24.91 15154 0.5822 1.09x 100.19%
2:4 sparsity (mlp only) 24.81 15632 0.5672 1.10x 97.61%
2:4 sparsity (attn + mlp) 24.30 13429 0.5306 1.07x 91.31%
int8 dynamic quant (attn)
int8 dynamic quant + 2:4 sparsity (mlp lin1)
2:4 sparsity (mlp lin2)
26.46 14865 0.5668 1.16x 97.54%

This just copies over the evaluation scripts. Eventually I think we should move over the modeling code too, but plan to do that in a subsequent PR.

Copy link

pytorch-bot bot commented Jun 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/457

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad8b42f with merge base 5d22ad2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2024
@jcaip jcaip marked this pull request as draft June 28, 2024 00:03
@jcaip jcaip marked this pull request as ready for review June 28, 2024 02:18
@jcaip jcaip changed the title [wip] Add segment-anything-fast benchmarks to torchao Add segment-anything-fast perf/acc benchmarks to torchao Jun 28, 2024

| qkv | proj | lin1 | lin2 | time | memory | img/s |
| ---- | ---- | ---- | ---- | ---- | ------ | ----- |
| None | None | None | None | 1361.73 | 15.81 | 23.50 |
Copy link
Contributor Author

@jcaip jcaip Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers are a bit higher than the one reported in the new benchmark script, I'm pretty sure this is because we reuse the same example for benchmarking and not because of any changes to the code.

I grabbed the output of TORCH_LOGS=output_code for both the new and old benchmark script and diffed them, they look pretty much identical:

Screenshot 2024-06-27 at 10 43 56 PM

@jcaip jcaip requested review from HDCharles and msaroufim June 28, 2024 12:58
@@ -20,29 +20,32 @@ More concretely, we hope to provide tutorials and APIs for both sparse kernels (

## Success Stories

#### segment-anything
#### segment-anything-fast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the main README as well with some of these results

# run_once(qkv="quant", proj="quant", lin1="quant+sparse (cusparselt)", lin2="quant+sparse (cusparselt)"),
# run_once(qkv="quant+sparse (cutlass)", proj="quant+sparse (cutlass)", lin1="quant+sparse (cutlass)", lin2="quant+sparse (cutlass)"),
]
ALL_RUNS = [run_once(qkv="quant", proj="quant", lin1="quant+sparse (cusparselt)", lin2="sparse (cusparselt)")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me just remove this file, it's unnecessary now that we have the SAF eval code. I just left it in the PR so I could pull torch logs.

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor nits

@jcaip
Copy link
Contributor Author

jcaip commented Jul 2, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jcaip
Copy link
Contributor Author

jcaip commented Jul 2, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@msaroufim
Copy link
Member

@huydhn @clee2000 another example of mergebot not merging stuff

@jcaip jcaip merged commit f22e8e8 into main Jul 2, 2024
13 checks passed
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
This PR adds in segment-anything-fast evaluation to torchao, and also adds benchmarks for int8 quantization + 2:4 sparsity. 

With this we can run combined perf/accuracy benchmarks for segment-anything. This should give us a starting point for the relative perf vs relative acc graph for PTC. 

| Model Type | Technique                                                                                            | img/s | memory (MiB) | mIoU   | relative speedup | relative accuracy |
|------------|------------------------------------------------------------------------------------------------------|-------|--------------|--------|------------------|-------------------|
| ViT-h      | baseline (bfloat16, max-autotune)                                                                    | 22.75 | 15172        | 0.5811 |                  |                   |
|            | int8 dynamic quant (attn + mlp)                                                                      | 24.91 | 15154        | 0.5822 | **1.09x**        | **100.19%**       |
|            | 2:4 sparsity (mlp only)                                                                              | 24.81 | 15632        | 0.5672 | **1.10x**        | **97.61%**        |
|            | 2:4 sparsity (attn + mlp)                                                                            | 24.30 | 13429        | 0.5306 | **1.07x**        | **91.31%**        |
|            | int8 dynamic quant (attn)<br>int8 dynamic quant + 2:4 sparsity (mlp lin1)<br>2:4 sparsity (mlp lin2) | 26.46 | 14865        | 0.5668 | **1.16x**        | **97.54%**        |

This just copies over the evaluation scripts. Eventually I think we should move over the modeling code too, but plan to do that in a subsequent PR.
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
* scrub & reformat code

* use full paths

* set tiktoken init to False, not None to align with new tokenizer chatting logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants