Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to query device count via NVML #14631

Merged
merged 19 commits into from
Sep 22, 2022
Merged

Attempt to query device count via NVML #14631

merged 19 commits into from
Sep 22, 2022

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Sep 9, 2022

What does this PR do?

Redo of #14319
Note: This is an enhancement/improvement of the preivous bugfix, not addressing any new bugs.

  • User does not need to set an environment variable
  • Faster: no new processes need to be launched in order to evaluate the functions

This adopts the solution in pytorch/pytorch#84879 for Lightning when running with PyTorch < 1.13.
A minor follow-up is still pending: pytorch/pytorch#85024

Validated with nightly pytorch 1.13 locally, using this simple script:

import torch
import torch.multiprocessing as mp
from lightning_lite.utilities.imports import _TORCH_GREATER_EQUAL_1_13
from lightning_lite.utilities.device_parser import num_cuda_devices, is_cuda_available


def worker(rank):
    print("successfully forked", rank)
    torch.cuda.set_device(rank)


def run():
    print("torch version", torch.__version__)
    print("greater than 1.13?", _TORCH_GREATER_EQUAL_1_13)

    # old function
    torch.cuda.device_count()
    # torch.cuda.is_available()

    # new function
    # print("num_cuda_devices:", num_cuda_devices())
    # print("available:", is_cuda_available())

    mp.start_processes(worker, nprocs=2, start_method="fork")


if __name__ == "__main__":
    run()

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

I made sure I had fun coding 🙃

cc @Borda @akihironitta

@awaelchli awaelchli added feature Is an improvement or enhancement accelerator: cuda Compute Unified Device Architecture GPU labels Sep 9, 2022
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Sep 9, 2022
@awaelchli awaelchli added this to the pl:1.7.x milestone Sep 14, 2022
@awaelchli awaelchli added bug Something isn't working and removed feature Is an improvement or enhancement labels Sep 14, 2022
@awaelchli awaelchli self-assigned this Sep 14, 2022
@awaelchli awaelchli marked this pull request as ready for review September 14, 2022 21:14
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that this was fixed!

@mergify mergify bot added the ready PRs ready to be merged label Sep 16, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Sep 19, 2022
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Sep 20, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Sep 21, 2022
@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Merging #14631 (3ae5d4e) into master (31788db) will increase coverage by 1%.
The diff coverage is 25%.

❗ Current head 3ae5d4e differs from pull request most recent head ed6628a. Consider uploading reports for the commit ed6628a to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #14631     +/-   ##
=========================================
+ Coverage      84%      85%     +1%     
=========================================
  Files         395      276    -119     
  Lines       28894    21238   -7656     
=========================================
- Hits        24236    18102   -6134     
+ Misses       4658     3136   -1522     

@awaelchli awaelchli enabled auto-merge (squash) September 22, 2022 09:16
@awaelchli awaelchli disabled auto-merge September 22, 2022 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: cuda Compute Unified Device Architecture GPU bug Something isn't working pl Generic label for PyTorch Lightning package ready PRs ready to be merged
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants