apply global search optimization for cuBLASLt in all type matmul#65723
Closed
ming1753 wants to merge 1 commit intoPaddlePaddle:developfrom
Closed
apply global search optimization for cuBLASLt in all type matmul#65723ming1753 wants to merge 1 commit intoPaddlePaddle:developfrom
ming1753 wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Performance Optimization
PR Types
Improvements
Description
Pcard-71500
对 #65597 的功能的应用场景进一步扩展,将cuBLASLt全局搜索应用在所有type的matmul。全局搜索通过FLAGS_enable_blaslt_global_search控制,默认关闭。开启后会在计算int8 matmul时启用cuBLASLt全局搜索找寻最优kernel并缓存至“./paddle_cublaslt_cache”,首次搜索耗时稍长,之后相同的矩阵乘复用cache,不再搜索。
如下matmul示例的各种type性能在A30上测试如下: