Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid fallback on CPU if no devices are provided #12410

Merged
merged 9 commits into from
Mar 25, 2022

Conversation

rohitgr7
Copy link
Contributor

@rohitgr7 rohitgr7 commented Mar 22, 2022

What does this PR do?

Addresses: #12160 (comment)

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @justusschock @kaushikb11 @awaelchli @ninginthecloud @akihironitta @rohitgr7

@rohitgr7 rohitgr7 marked this pull request as ready for review March 22, 2022 12:51
@carmocca carmocca added the breaking change Includes a breaking change label Mar 22, 2022
Copy link
Contributor

@kaushikb11 kaushikb11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitgr7 rohitgr7 requested a review from edenlightning as a code owner March 23, 2022 09:43
@rohitgr7 rohitgr7 requested a review from kaushikb11 March 23, 2022 09:43
@mergify mergify bot added the ready PRs ready to be merged label Mar 23, 2022
@rohitgr7 rohitgr7 enabled auto-merge (squash) March 23, 2022 10:42
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Mar 23, 2022
@rohitgr7 rohitgr7 requested a review from carmocca March 23, 2022 14:49
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Mar 24, 2022
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Mar 25, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Mar 25, 2022
@rohitgr7 rohitgr7 merged commit 48f1710 into master Mar 25, 2022
@rohitgr7 rohitgr7 deleted the ref/avoid_cpu_fallback branch March 25, 2022 15:59
@DuYicong515
Copy link
Contributor

Hi @rohitgr7, it makes sense to me Trainer(accelerator="gpu", devices=[]/0/"0") throws instead of falling back to cpu.

However, currently Trainer(gpus=[]/0/"0") still falls back to CPU.
Trainer(accelerator="gpu", gpus=[]/0/"0") will work as "auto" and use all GPU devices when GPUs are available , and throws if GPU is not available.

Even gpus is being deprecated as a Trainer argument, shall we still make the behaviour consistent here? I feel it's confused that those arguments behaves differently on similar settings.

@awaelchli
Copy link
Contributor

However, currently Trainer(gpus=[]/0/"0") still falls back to CPU.

That's ok, it is documented here.

Trainer(accelerator="gpu", gpus=[]/0/"0") will work as "auto" and use all GPU devices when GPUs are available

Agreed, this is probably not intended and should be changed.

@@ -506,6 +506,15 @@ def test_accelerator_cpu(_):
trainer = Trainer(accelerator="cpu", gpus=1)


@mock.patch("torch.cuda.is_available", return_value=False)
Copy link
Contributor

@kaushikb11 kaushikb11 Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitgr7

Here, we assumed the accelerator is not available. What if it's available and user pass devices="0"/0/ []]?

It leads to such errors

    self._parallel_devices = self.accelerator.get_parallel_devices(self._devices_flag)
  File "/home/jovyan/pytorch-lightning/pytorch_lightning/accelerators/gpu.py", line 82, in get_parallel_devices
    return [torch.device("cuda", i) for i in devices]
TypeError: 'NoneType' object is not iterable

It is addressed in #12633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator breaking change Includes a breaking change ready PRs ready to be merged trainer: connector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants