Skip to content

[Inference] Refine global search optimization for cuBLASLt and apply it in INT8 GEMM.#65597

Merged
ming1753 merged 2 commits intoPaddlePaddle:developfrom
ming1753:blaslt
Jul 4, 2024
Merged

[Inference] Refine global search optimization for cuBLASLt and apply it in INT8 GEMM.#65597
ming1753 merged 2 commits intoPaddlePaddle:developfrom
ming1753:blaslt

Conversation

@ming1753
Copy link
Contributor

@ming1753 ming1753 commented Jul 1, 2024

PR Category

Inference

PR Types

New features

Description

Pcard-71500

将fp8中的cublaslt矩阵乘法全局搜索算法抽离至一个头文件,并同时应用于fp8和int8的matmul计算。

新增flag: FLAGS_enable_blaslt_global_search,默认false,关闭功能。

# 开启
export FLAGS_enable_blaslt_global_search=1

开启后会在计算int8 matmul时启用cuBLASLt全局搜索找寻最优kernel并缓存至“./paddle_cublaslt_cache”,首次搜索耗时稍长,之后相同的矩阵乘复用cache,不再搜索。

@paddle-bot
Copy link

paddle-bot bot commented Jul 1, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

int repeats = search_times_;

for (int loop = 0; loop < repeats; loop++) {
status = dynload::cublasLtMatmul(handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在正式计时之前,是否应该先warmup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实测有没有warmup对最终的性能无影响

}

template <typename InT, typename OutT>
void TestMatmulRun(cublasLtHandle_t handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为何这个函数叫Test***?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经改为RunAndMeasureAlgo

@@ -0,0 +1,703 @@
/* Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023 -> 2024

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

Comment on lines 15 to 17
#pragma once

#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重复

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改


namespace phi {
namespace funcs {
namespace cublaslt_internal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有必要新增一个cublaslt_internal吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个功能默认不开启,而且不计划对外暴露,添加一个namesapce标识下也没问题吧

Comment on lines 481 to 496
TestMatmulRun(handle,
matmul_desc,
a_desc,
b_desc,
bias_desc,
c_desc,
alpha,
beta,
a,
b,
bias,
c,
params[i],
start_event,
stop_event,
stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数里面有判断失败的情况,这里却没有任何利用的逻辑,不妨TestMatmulRun返回一个bool类型表示是否失败,这里有对应的处理逻辑

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

处理在函数内部,判断失败了之后time记为max

@qingqing01 qingqing01 changed the title [Inference] cublaslt global search [Inference] Refine global search optimization for cuBLASLt and apply it in INT8 GEMM. Jul 4, 2024
Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for add flags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants