Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add branch condition for calling move to device in prefetch (FSDP 3/n) #6342

Closed
wants to merge 5 commits into from

Conversation

SeanNaren
Copy link
Contributor

@SeanNaren SeanNaren commented Mar 4, 2021

What does this PR do?

Adds an additional property to the DDP/DDP Spawn Plugin that allows plugins that extend DDP to disable moving model to device. In FSDP this is required since we can only move to device after configure_ddp is called, which would require overriding this function pre_dispatch. Also fixed an issue where spawning the DDP Plugin was not possible because we missed copying a default setter.

Fixes #<issue_number>

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@SeanNaren SeanNaren added feature Is an improvement or enhancement distributed Generic distributed-related topic labels Mar 4, 2021
@SeanNaren SeanNaren added this to the 1.3 milestone Mar 4, 2021
@SeanNaren SeanNaren self-assigned this Mar 4, 2021

@pytest.mark.parametrize('move_to_device_pre_dispatch_enabled', [True, False])
@mock.patch('pytorch_lightning.plugins.DDPPlugin.model_to_device')
def test_move_to_device_in_pre_dispatch(mock_model_to_device, tmpdir, move_to_device_pre_dispatch_enabled):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • Would've liked to add a spawn test but mocks don't seem to carry across to new processes?
  • Can I combine patch and parametrize?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Nope, you'd have to create the mock in each process
  2. What do you mean? Having different mocks for each parametrize? I don't think so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to do so with a callback, applying the patch context manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make patch conditional on parametrize. with the return object.

@codecov
Copy link

codecov bot commented Mar 4, 2021

Codecov Report

Merging #6342 (b285087) into master (484dce1) will increase coverage by 1%.
The diff coverage is 93%.

@@           Coverage Diff           @@
##           master   #6342    +/-   ##
=======================================
+ Coverage      91%     92%    +1%     
=======================================
  Files         160     161     +1     
  Lines       11408   11476    +68     
=======================================
+ Hits        10384   10547   +163     
+ Misses       1024     929    -95     

pytorch_lightning/plugins/training_type/ddp_spawn.py Outdated Show resolved Hide resolved
Comment on lines +317 to +323
@property
def call_move_to_device_hook_in_pre_dispatch(self) -> bool:
"""
Call the ``model_to_device`` function within pre_dispatch if this is set to True.
Useful for when plugin would like to call model_to_device at another time, or skip the call.
"""
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in ParallelPlugin?

):
model = BoringModel()
trainer = Trainer(
default_root_dir=tmpdir, fast_dev_run=True, accelerator='ddp', plugins=DDPPlugin(), num_processes=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to pass DDPPlugin?


@pytest.mark.parametrize('move_to_device_pre_dispatch_enabled', [True, False])
@mock.patch('pytorch_lightning.plugins.DDPPlugin.model_to_device')
def test_move_to_device_in_pre_dispatch(mock_model_to_device, tmpdir, move_to_device_pre_dispatch_enabled):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Nope, you'd have to create the mock in each process
  2. What do you mean? Having different mocks for each parametrize? I don't think so

@tchaton
Copy link
Contributor

tchaton commented Mar 12, 2021

Hey @SeanNaren. Any updates ?

@SeanNaren
Copy link
Contributor Author

Given how little now exists in the pre_dispatch function after #6506, we don't need this anymore :)

@SeanNaren SeanNaren closed this Mar 18, 2021
@Borda Borda deleted the feature/fsdp_3n branch April 23, 2021 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed Generic distributed-related topic feature Is an improvement or enhancement has conflicts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants