Skip to content

apply global search optimization for cuBLASLt in all type matmul#65723

Closed
ming1753 wants to merge 1 commit intoPaddlePaddle:developfrom
ming1753:blaslt_fp
Closed

apply global search optimization for cuBLASLt in all type matmul#65723
ming1753 wants to merge 1 commit intoPaddlePaddle:developfrom
ming1753:blaslt_fp

Conversation

@ming1753
Copy link
Contributor

@ming1753 ming1753 commented Jul 4, 2024

PR Category

Performance Optimization

PR Types

Improvements

Description

Pcard-71500

#65597 的功能的应用场景进一步扩展,将cuBLASLt全局搜索应用在所有type的matmul。全局搜索通过FLAGS_enable_blaslt_global_search控制,默认关闭。开启后会在计算int8 matmul时启用cuBLASLt全局搜索找寻最优kernel并缓存至“./paddle_cublaslt_cache”,首次搜索耗时稍长,之后相同的矩阵乘复用cache,不再搜索。

#开启
export FLAGS_enable_blaslt_global_search=1

如下matmul示例的各种type性能在A30上测试如下:

import paddle

dtype = 'float32'
x = paddle.ones([10, 2048], dtype=dtype)
y = paddle.ones([6144, 2048], dtype=dtype)
c = paddle.matmul(x, y, transpose_x=False, transpose_y=True)
类型 开启前时延(μs) 开启后时延(μs)
float32 15.544 15.573
float16 16.145 15.394
bfloat16 16.331 16.174
int32 26.993 27.532
int8 11.744 10.798

@paddle-bot
Copy link

paddle-bot bot commented Jul 4, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@ming1753 ming1753 closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments