Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nvrtc usage #60943

Closed
wants to merge 1 commit into from
Closed

Conversation

jeng1220
Copy link
Collaborator

@jeng1220 jeng1220 commented Jan 18, 2024

PR types

Bug fixes

PR changes

Others

Description

The phi uses nvrtc to implement JIT-compilation. However, it uses --gpu-architecture=cmopute_ rather than --gpu-architecture=sm_. The difference is that compute_ only generates PTX code instead of machine code (ie, SASS). The sm_ has both.

So, the workflow is nvrtc generates PTX code, then invokes the driver to generate machine code.

The problem is if the driver version doesn't match CUDA version, the driver cannot recognize the format of PTX code, so it triggers “provided ptx was compiled with an unsupported toolchain” error. This issue cannot be resolved by “CUDA forward compatibility enabled”.

The only solutions are two:

  1. Upgrade driver version.
  2. Let nvrtc generate machine code instead of PTX code

Many peoples cannot upgrade the driver frequently, so this patch is the 2nd solution.

Copy link

paddle-bot bot commented Jan 18, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Jan 18, 2024
@@ -335,7 +335,7 @@ bool GPUDeviceCode::Compile(bool include_path) {
DeviceContextPool::Instance().Get(place_));
int compute_capability = dev_ctx->GetComputeCapability();
std::string compute_flag =
"--gpu-architecture=compute_" + std::to_string(compute_capability);
"--gpu-architecture=sm_" + std::to_string(compute_capability);
std::vector<const char*> options = {"--std=c++11", compute_flag.c_str()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-gpu-architecture=sm_XX:这个选项会让编译器直接生成目标GPU架构的机器码(SASS)。使用sm_XX可以避免运行时的JIT编译步骤,但是这样生成的代码只能在相同架构的GPU上运行。使用compute_XX可以使得生成的代码在不同架构的GPU上都能运行。。有考虑过这个问题吗?

Copy link

paddle-ci-bot bot commented Jan 26, 2024

Sorry to inform you that f14857f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@jeng1220
Copy link
Collaborator Author

jeng1220 commented Jan 26, 2024

@zyfncg, @risemeup1,

使用compute_XX可以使得生成的代码在不同架构的GPU上都能运行。。有考虑过这个问题吗?

sm_XX 能產生 PTX 和 SASS 即我代碼裡的 CUBIN,但 SASS 不能跨架構
PTX是IR,透過驅動JIT後成SASS才能執行

不過,我再深入調研了代碼,發現更本的問題不在 NVRTC 的使用
所以這 PR 不再需要,錯誤在PaddlePaddle的其他地方
我會再另題PR,先關閉這個

@jeng1220 jeng1220 closed this Jan 26, 2024
@jeng1220 jeng1220 deleted the bugfix_nvrtc_usage branch March 6, 2024 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers NVIDIA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants