Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/2] Collaborative Strategy #12842

Merged
merged 51 commits into from
May 5, 2022
Merged

[1/2] Collaborative Strategy #12842

merged 51 commits into from
May 5, 2022

Conversation

SeanNaren
Copy link
Contributor

@SeanNaren SeanNaren commented Apr 21, 2022

What does this PR do?

Related #12647

This PR represents the code portion of the strategy. The next PR will be the documentation.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @akihironitta @justusschock

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an awesome addition. can't wait for the full docs!

pytorch_lightning/utilities/imports.py Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
@SeanNaren
Copy link
Contributor Author

SeanNaren commented Apr 22, 2022

Huge thanks @awaelchli @justusschock for going through and reviewing, really appreciate it ❤️

Two things I need to address:

  • Maybe we should consider inheriting from the SingleDeviceStrategy instead as currently only single device is supported?
  • Should we mark this strategy as experimental/stable?
    • Given most of this code is a wrapper over hivemind, I think calling it experimental is a bit overkill but I can be swayed either way

@justusschock
Copy link
Member

Maybe we should consider inheriting from the SingleDeviceStrategy instead as currently only single device is supported?

IMO we shouldn't do that. It has one single device per process, but in total allows training on many devices. That way it behaves like an elastic multi-node DDP if the processes are spawned externally.

Should we mark this strategy as experimental/stable?
Given most of this code is a wrapper over hivemind, I think calling it experimental is a bit overkill but I can be swayed either way

I think, I'd still mark it as experimental. This gives us more freedom to change stuff (like the per-node DDP thingy you mentioned).
Also we can mark it as stable whenever we want later on.
I think marking this directly as stable may yield false expectations from users.

Copy link
Member

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some small comments 😃

pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
@kaushikb11 kaushikb11 added the feature Is an improvement or enhancement label Apr 25, 2022
requirements/extra.txt Outdated Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
@mergify mergify bot added the ready PRs ready to be merged label May 5, 2022
Copy link
Contributor

@kaushikb11 kaushikb11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@SeanNaren SeanNaren enabled auto-merge (squash) May 5, 2022 11:35
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing

pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
requirements/strategies.txt Show resolved Hide resolved
pytorch_lightning/strategies/collaborative.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome 🚀

pytorch_lightning/strategies/collaborative.py Show resolved Hide resolved
Sean Naren and others added 2 commits May 5, 2022 12:42
@SeanNaren SeanNaren disabled auto-merge May 5, 2022 11:47
@SeanNaren SeanNaren enabled auto-merge (squash) May 5, 2022 11:47
Sean Naren and others added 2 commits May 5, 2022 12:58
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gatekeeper/pass: @SeanNaren

@SeanNaren SeanNaren merged commit 1a502c0 into master May 5, 2022
@SeanNaren SeanNaren deleted the feat/collab_training_1n branch May 5, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged strategy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce Collaborative Training Strategy!
9 participants