-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup phased transition away from PyTorch version-specific handling of cuda availability and device counting #15133
Setup phased transition away from PyTorch version-specific handling of cuda availability and device counting #15133
Conversation
…f cuda availability and device counting
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #15133 +/- ##
=========================================
+ Coverage 82% 84% +1%
=========================================
Files 408 288 -120
Lines 29907 21940 -7967
=========================================
- Hits 24597 18329 -6268
+ Misses 5310 3611 -1699 |
@speediedan Not sure if I 100% understand the proposed change. Is it because partial changes are in 1.13, where device_count uses NVML yet is_available is not yet using the new implementation? |
Yep! It allows to use the relevant PyTorch functions directly whenever they are available, only using the temporary PL versions when necessary. It also allows us to remove the PL NVML code from upstream once PT 1.13 is the minimum (instead of when PT 1.14 is the minimum). |
Co-authored-by: Carlos Mocholí <[email protected]>
What does this PR do?
Setup a phased transition away from PyTorch version-specific handling of CUDA availability (
torch.cuda.is_available()
) and device counting (torch.cuda.device_count()
)This is a follow-up to #15110 and #85951 that prepares for removal of unnecessary PyTorch version-specific code once PyTorch 1.13 and 1.14 respectively are the minimum supported PyTorch.
Specifically:
As of PyTorch 1.14, is_cuda_available() and num_cuda_devices() should be functionally aliases for
torch.cuda.is_available()
andtorch.cuda.num_cuda_devices()
respectively that do not require special PyTorch-specific handling and can take advantage of potential upstream enhancements to the functions (e.g., if those functions abstract ROCm CUDA availability checks using ROCm's SMI analog to NVML).It likely makes sense to retain the is_cuda_available() and num_cuda_devices() wrappers in the code base to facilitate future PyTorch version-specific handling if it becomes necessary but theoretically those wrappers could be replaced with their associated PyTorch functions as well.
Also perhaps relevant, I've tested these changes locally on PyTorch
1.13.0rc3
and1.14.0.dev20221013
cc @awaelchli @carmocca
Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun 🙃