Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error 46 (cudaErrorDevicesUnavailable) on Azure NVadsA10 v5 with IsaacSim #28

Open
alwunn opened this issue Feb 26, 2025 · 1 comment

Comments

@alwunn
Copy link

alwunn commented Feb 26, 2025

I'm encountering a persistent CUDA error when running IsaacSim deployed using the IsaacAutomator on an Azure NVadsA10 v5 VM. The error appears during startup and prevents IsaacSim from fully initializing. Below are the details of my environment, the error messages, and the troubleshooting steps I've taken.

Environment Details:

  • Azure VM: Standard_NV36ads_A10_v5 / Standard_NV36adms_A10_v5
  • Driver Version: 550.144.03 (NVIDIA GRID driver)
  • CUDA Version: 12.4

Error Log Snippet from IsaacSim:

[carb.cudainterop.plugin] CUDA error 46: cudaErrorDevicesUnavailable - CUDA-capable device(s) is/are busy or unavailable)
|---------------------------------------------------------------------------------------------|
| Driver Version: 550.144.03    | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name                             | Active | LDA | GPU Memory | Vendor-ID | LUID       |
|     |                                  |        |     |            | Device-ID | UUID       |
|     |                                  |        |     |            | Bus-ID    |            |
|---------------------------------------------------------------------------------------------|
| 0   | NVIDIA A10-24Q                   | Yes: 0 |     | 24758   MB | 10de      | 0          |
|     |                                  |        |     |            | 2236      | 17bca580.. |
|     |                                  |        |     |            | 0         |            |
|=============================================================================================|
| OS: 22.04.5 LTS (Jammy Jellyfish) ubuntu, Version: 22.04.5, Kernel: 6.8.0-1021-azure
| XServer Vendor: The X.Org Foundation, XServer Version: 12101004 (1.21.1.4)
| Processor: AMD EPYC 74F3 24-Core Processor
| Cores: 18 | Logical Cores: 36
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 443392 | Free Memory: 438432
| Total Page/Swap (MB): 32767 | Free Page/Swap: 0
|---------------------------------------------------------------------------------------------|
...
[5,132ms] [Error] [omni.physx.tensors.plugin] CUDA error: CUDA-capable device(s) is/are busy or unavailable: ../../../extensions/runtime/source/omni.physx.tensors/plugins/gpu/CudaCommon.h: 111
[5,132ms] [Error] [omni.physx.tensors.plugin] Failed to create primary CUDA context
...
[20,861ms] [Error] [carb.cudainterop.plugin] CUDA error 46: cudaErrorDevicesUnavailable - CUDA-capable device(s) is/are busy or unavailable)
[20,862ms] [Error] [carb.cudainterop.plugin] Failed to import external memory in CUDA
[20,862ms] [Error] [gpu.foundation.plugin] Cannot create cuda external memory for resource!
[Error] [gpu.foundation.plugin] Buffer creation failed for the device: 0.

Questions / Request for Assistance:

  • Has anyone experienced similar CUDA interop issues (specifically CUDA error 46) when using IsaacSim on an Azure NVadsA10 v5 instance?
  • Are there known workarounds or recommended driver versions (or VM configurations) that are known to work better with IsaacSim in a vGPU environment?
  • Is there a recommended way to disable any potential conflicting internal (CPU-based) Vulkan devices or any additional environment variables that could help resolve this error?

Any insights, recommendations, or guidance to help resolve this issue would be greatly appreciated.

Thank you!

@alwunn
Copy link
Author

alwunn commented Feb 27, 2025

Update: Testing with Different VM Sizes

After further investigation, I can confirm that the CUDA error (cudaErrorDevicesUnavailable) does not occur on smaller VM sizes. Specifically:

  • NV6ads_A10_v5
  • NV12ads_A10_v5
  • NV18ads_A10_v5

This behavior suggests that the issue might be related to the specific vGPU configuration or resource allocation on the larger NVadsA10_v5 instance.

Has anyone else experienced similar behavior or have insights into why larger VM sizes might trigger this CUDA error while smaller ones do not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant