Skip to content

Conversation

@krishnaraj36
Copy link
Contributor

@krishnaraj36 krishnaraj36 commented May 7, 2024

Improved the gemv outer fallback schedules. It improved few gemv kernel by 20%.

LLM model --- baseline --- improved
gemma-2b-it --- 22.4 tok/sec --- 25.2 tok/sec
Qwen-7b-chat --- 11 tok/sec --- 11.8 tok/sec

@krishnaraj36
Copy link
Contributor Author

@srkreddy1238 @tqchen : Can you please take a look to this PR. let me know your advise.

@tqchen tqchen requested a review from Hzfengsy May 9, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants