Use a CUDAGuard when running Torch models #340

VivekPanyam · 2020-04-23T02:00:11Z

This PR ensures that we're running on the correct device even if something else calls cudaSetDevice before running inference.

This fixes a class of issues where another piece of code changes the current device for the current thread. For example, this can happen if TF and Torch run together on the same threadpool. TF will call cudaSetDevice and cause torch to break if it runs on the same thread in the future.

This can cause some obscure cuDNN errors and generally hard-to-debug issues.

vkuzmin-uber · 2020-08-31T20:08:31Z

source/neuropod/backends/torchscript/torch_backend.cc

@@ -291,6 +295,16 @@ std::unique_ptr<NeuropodValueMap> TorchNeuropodBackend::infer_internal(const Neu
 {
    torch::NoGradGuard guard;

+#ifndef __APPLE__
+    // Make sure we're running on the correct device
+    std::unique_ptr<at::cuda::CUDAGuard> device_guard;


#include <memory>

vkuzmin-uber · 2020-08-31T20:15:09Z

source/neuropod/backends/torchscript/torch_backend.cc

+    const auto                           model_device = get_torch_device(DeviceType::GPU);
+    if (model_device.is_cuda())
+    {
+        device_guard = stdx::make_unique<at::cuda::CUDAGuard>(model_device);


I guess we can use std:: here not stdx:: because Neuropod became C++14 recently, right?

VivekPanyam requested a review from selitvin April 23, 2020 02:00

Use a CUDAGuard when running Torch models

c23111d

VivekPanyam force-pushed the cuda_guard branch from bd5263b to c23111d Compare April 23, 2020 02:03

vkuzmin-uber reviewed Aug 31, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a CUDAGuard when running Torch models #340

Use a CUDAGuard when running Torch models #340

VivekPanyam commented Apr 23, 2020

vkuzmin-uber Aug 31, 2020

vkuzmin-uber Aug 31, 2020

Use a CUDAGuard when running Torch models #340

Are you sure you want to change the base?

Use a CUDAGuard when running Torch models #340

Conversation

VivekPanyam commented Apr 23, 2020

vkuzmin-uber Aug 31, 2020

Choose a reason for hiding this comment

vkuzmin-uber Aug 31, 2020

Choose a reason for hiding this comment