Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tensor core GQA dispatch for [4,5,6,8] #1258

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Mar 6, 2024

Benchmarks results of various CTA_H

A100 80G, batch size 128, seq len 1024, w/o split-kv, in microseconds

32:8 1 2 4
SIMT 513.70 316.26 319.70
TC 509.66 318.91 308.35
48:8 1 2 3 6
SIMT 755.17 382.27 354.37 367.46*
TC 756.35 401.28 319.58 309.31
64:8 1 2 4 8
SIMT 1020 497.92 406.27 985.82*
TC 993.98 517.86 319.39 305.12
80:8 1 2 5
SIMT 1260 616 485.70
TC 1230 635.52 318.75

*register spill

Conclusion: use TC when head_num / kv_head_num > 2

@lzhangzz lzhangzz added enhancement New feature or request turbomind labels Mar 6, 2024
@zhyncs
Copy link
Collaborator

zhyncs commented Mar 6, 2024

Awesome! May you write an advanced guide for gemm tuning at https://github.com/InternLM/lmdeploy?tab=readme-ov-file#tutorials

@jjjjohnson
Copy link
Contributor

what does TC stands for?

@zhyncs
Copy link
Collaborator

zhyncs commented Mar 7, 2024

what does TC stands for?

tensor core

@lvhan028 lvhan028 merged commit 331858b into InternLM:turbomind-2.1 Mar 11, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request turbomind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants