-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GetComputeRunningProcesses on CUDA 10.x #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the submission @XuehaiPan. This is definitely more robust than what I had implemented.
I have left some comments below. There are one or two that need to be addressed before we can merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @XuehaiPan.
Thanks for the work on this!
I think this is fine to unblock you for now, but I wouldn't spend too much time "perfecting" it as we will need tocome up with a more general solution in the near future. I see the "real" solution as:
|
Thanks for the input @klueska. It was also pointed out to me that more recent versions of the |
The definition of
struct nvmlProcessInfo_st
has introduce two new fieldsgpuInstanceId
andcomputeInstanceId
in newer NVIDIA drivers (CUDA 11.x). And this changes will break the backward compatibility for old NVIDIA drivers (CUDA 10.x).This PR fixes issue #21 for
nvmlDeviceGetComputeRunningProcesses
nvmlDeviceGetGraphicsRunningProcesses
on CUDA 10.x caused by the data structure size change. It simply calls the v1 version function (feed with v1 data structure) and convert the results into v2 ones on CUDA 10.x. And for CUDA 11.x, call the v2 version directly.There is another solution (PR #22) that calls the v1 version (but feed with v2 data structure) and correct the results with some adjustments on CUDA 10.x. As commented in #21 (comment), this implementation should cover a large number of configuration situations (different CPU bit width / CPU arch / complie options, etc.).
The results on CUDA 10.x
Fixes #21
Closes #22