Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Compilation issue of NVML usage across all possible drivers #20863

Closed
guanxingithub opened this issue Jan 31, 2022 · 5 comments
Closed

Compilation issue of NVML usage across all possible drivers #20863

guanxingithub opened this issue Jan 31, 2022 · 5 comments

Comments

@guanxingithub
Copy link
Contributor

Description

We would like to report a compilation issue on the master branch, related to use of NVIDIA’s NVML library. The source lines involved are: https://github.com/apache/incubator-mxnet/blob/master/src/profiler/storage_profiler.cc#L103-L111

These were the same lines that caused issue#20145, as was fixed by @Zha0q1 in PR#20146. The problem is that these source lines still have a sensitivity to the driver version and cmake build flag NVML_NO_UNVERSIONED_FUNC_DEFS.

Error Message

This issue was found when we compile MXNet master on the cuda11 450.x driver, where we see:

FAILED:

CMakeFiles/mxnet.dir/src/profiler/storage_profiler.cc.o../src/profiler/storage_profiler.cc:109:78: error: cannot convert ‘nvmlProcessInfo_st*’ to ‘nvmlProcessInfo_v1_t*’ {aka ‘nvmlProcessInfo_v1_st*’}
109 | nvmlDeviceGetComputeRunningProcesses(nvml_device, &info_count, infos.data());

In file included from ../src/profiler/storage_profiler.cc:22:
/usr/local/cuda/include/nvml.h:8403:127: note: initializing argument 3 of ‘nvmlReturn_t nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t, unsigned int*, nvmlProcessInfo_v1_t*)’
8403 | nvmlReturn_t DECLDIR nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t device, unsigned int *infoCount, nvmlProcessInfo_v1_t *infos);

Steps to reproduce

  1. Find machine with cuda11 450.x driver
  2. Compile mxnet

What have you tried to solve it?

  1. This issue was found and fixed by Dick Carter
  2. @DickJC123 has developed a general solution that avoids compilation errors no matter which signature of the nvmlDeviceGetComputeRunningProcesses() function is enabled in the code. We will be submitting this fix as a PR shortly.
@github-actions
Copy link

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

@TristonC
Copy link
Contributor

TristonC commented Feb 1, 2022

@ptrendx Please help to review.

@guanxingithub
Copy link
Contributor Author

PR was filed #20866

@guanxingithub
Copy link
Contributor Author

PR for merging the solutions of #20499 and #20866 was filed as #20887

@guanxingithub
Copy link
Contributor Author

This issue was fixed and merged in PR #20877

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants