fix nvrtc usage #60943

jeng1220 · 2024-01-18T08:59:04Z

PR types

Bug fixes

PR changes

Others

Description

The phi uses nvrtc to implement JIT-compilation. However, it uses --gpu-architecture=cmopute_ rather than --gpu-architecture=sm_. The difference is that compute_ only generates PTX code instead of machine code (ie, SASS). The sm_ has both.

So, the workflow is nvrtc generates PTX code, then invokes the driver to generate machine code.

The problem is if the driver version doesn't match CUDA version, the driver cannot recognize the format of PTX code, so it triggers “provided ptx was compiled with an unsupported toolchain” error. This issue cannot be resolved by “CUDA forward compatibility enabled”.

The only solutions are two:

Upgrade driver version.
Let nvrtc generate machine code instead of PTX code

Many peoples cannot upgrade the driver frequently, so this patch is the 2nd solution.

paddle-bot · 2024-01-18T08:59:08Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

risemeup1 · 2024-01-18T09:29:34Z

paddle/phi/backends/device_code.cc

@@ -335,7 +335,7 @@ bool GPUDeviceCode::Compile(bool include_path) {
      DeviceContextPool::Instance().Get(place_));
  int compute_capability = dev_ctx->GetComputeCapability();
  std::string compute_flag =
-      "--gpu-architecture=compute_" + std::to_string(compute_capability);
+      "--gpu-architecture=sm_" + std::to_string(compute_capability);
  std::vector<const char*> options = {"--std=c++11", compute_flag.c_str()};


-gpu-architecture=sm_XX：这个选项会让编译器直接生成目标GPU架构的机器码（SASS）。使用sm_XX可以避免运行时的JIT编译步骤，但是这样生成的代码只能在相同架构的GPU上运行。使用compute_XX可以使得生成的代码在不同架构的GPU上都能运行。。有考虑过这个问题吗？

paddle-ci-bot · 2024-01-26T03:05:50Z

Sorry to inform you that f14857f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

jeng1220 · 2024-01-26T09:27:22Z

@zyfncg, @risemeup1,

使用compute_XX可以使得生成的代码在不同架构的GPU上都能运行。。有考虑过这个问题吗？

sm_XX 能產生 PTX 和 SASS 即我代碼裡的 CUBIN，但 SASS 不能跨架構
PTX是IR，透過驅動JIT後成SASS才能執行

不過，我再深入調研了代碼，發現更本的問題不在 NVRTC 的使用
所以這 PR 不再需要，錯誤在PaddlePaddle的其他地方
我會再另題PR，先關閉這個

fix CINN

f14857f

jeng1220 added the NVIDIA label Jan 18, 2024

paddle-bot bot added the contributor External developers label Jan 18, 2024

risemeup1 reviewed Jan 18, 2024

View reviewed changes

onecatcn assigned zyfncg Jan 22, 2024

jeng1220 mentioned this pull request Jan 22, 2024

PaddlePaddle 2.6.0 buglist, part 1 #60882

Closed

jeng1220 closed this Jan 26, 2024

jeng1220 deleted the bugfix_nvrtc_usage branch March 6, 2024 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix nvrtc usage #60943

fix nvrtc usage #60943

jeng1220 commented Jan 18, 2024 •

edited

Loading

paddle-bot bot commented Jan 18, 2024

risemeup1 Jan 18, 2024

paddle-ci-bot bot commented Jan 26, 2024

jeng1220 commented Jan 26, 2024 •

edited

Loading

fix nvrtc usage #60943

fix nvrtc usage #60943

Conversation

jeng1220 commented Jan 18, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jan 18, 2024

risemeup1 Jan 18, 2024

Choose a reason for hiding this comment

paddle-ci-bot bot commented Jan 26, 2024

jeng1220 commented Jan 26, 2024 • edited Loading

jeng1220 commented Jan 18, 2024 •

edited

Loading

jeng1220 commented Jan 26, 2024 •

edited

Loading