-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] Add KFold - External Loop. #8715
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8715 +/- ##
=======================================
- Coverage 93% 88% -4%
=======================================
Files 169 171 +2
Lines 14071 14265 +194
=======================================
- Hits 13040 12579 -461
- Misses 1031 1686 +655 |
@@ -0,0 +1,203 @@ | |||
# Copyright The PyTorch Lightning team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewing. Adding boring_model.py to utilities as it is pretty fundamental and should be part of the codebase for debugging purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the same. We already have a boring model for our tests and a boring model for bug reports and debugging. I vote for not including it in utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carmocca Any thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into the same issue in #7614 (comment) (blocked for the same reason).
Some pl_examples
rely on our MNIST
implementation, which is in the test directory.
We currently include everything in our distribution, which is what I try to avoid in the linked PR.
However, CI fails because then the pl_examples do not have access to the MNIST
implementation if we do.
So we have two real options:
- Duplicate the
BoringModel
/MNIST
implementations in thepl_examples
directory - Move
BoringModel
/MNIST
to the source tree (what this PR does) to avoid code duplication.
I think I prefer (2). If you guys agree I can do this change in a separate PR
cc @Borda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets continue discussing in #8776
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't think this is the right way to approach this. This isn't what loops are meant for and this weakens the call hierarchy in a way we shouldn't allow. In the related issue this was discussed and approved from all sides to be a standalone class
can we get a bigger picture of the PR? |
Hey @justusschock. I strongly disagree there. I believe this is what Loop are meant from the beginning. Users should have the choice to either built on top of Lightning The first approach should be used when multiple Furthermore, if users properly implements the ExternalLoop contract, Lightning can add checkpointing + fault tolerant to their loops while maintaining full customization. Note: The goal is not to expose the Trainer internals and we need to be clear this is meant to be used with @awaelchli @williamFalcon Any thoughts there ? |
@@ -0,0 +1,203 @@ | |||
# Copyright The PyTorch Lightning team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the same. We already have a boring model for our tests and a boring model for bug reports and debugging. I vote for not including it in utilities.
|
||
# utilities for creating a hold | ||
def process_dataset(self, stage: str, dataset: Dataset) -> Subset: | ||
kfold = KFold(self.num_folds, random_state=42, shuffle=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a dependency for sklearn worth it for just this?
Should we maybe have a more general abstract function create_splits
the user has to implement? Since there are so many different ways to create data splits. And we then only iterate over the splits here.
from pytorch_lightning.utilities import rank_zero_only | ||
from pytorch_lightning.utilities.boring_model import BoringModel, RandomDataset | ||
|
||
seed_everything(42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed_everything(42) |
rather not seed anything globally
def loop_base_callback() -> Type[Callback]: | ||
class BaseKFoldCallback(Callback): | ||
@rank_zero_only | ||
def on_fold_start(self, trainer, pl_module, counter): | ||
"""Override with your own logic""" | ||
|
||
return BaseKFoldCallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we define this outside this class but in the file namespace?
loop = KFoldLoop(5) | ||
model = BoringModel() | ||
datamodule = BoringDataModule() | ||
loop.connect_trainer(max_epochs=10, callbacks=KFoldCallback()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively these could be passed in through init via an argument trainer_kwargs.
class BaseDataModule(LightningDataModule): | ||
def __init__(self): | ||
super().__init__() | ||
self.non_picklable = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious, what was the idea here? seems left over :)
def __init__(self): | ||
super().__init__() | ||
self.non_picklable = None | ||
self.checkpoint_state: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here :)
"fit_loop": self.trainer.fit_loop.state_dict(), | ||
"validate_loop": self.trainer.validate_loop.state_dict(), | ||
"test_loop": self.trainer.test_loop.state_dict(), | ||
"predict_loop": self.trainer.predict_loop.state_dict(), | ||
} | ||
external_loop = getattr(self.trainer, "external_loop", None) | ||
if external_loop: | ||
state_dict.update({"external_loop": external_loop.state_dict()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can there be more than one external loop. I mean, one external loop nested inside another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could and Loop will automatically gather their children states.
"fit_loop": self.trainer.fit_loop.state_dict(), | ||
"validate_loop": self.trainer.validate_loop.state_dict(), | ||
"test_loop": self.trainer.test_loop.state_dict(), | ||
"predict_loop": self.trainer.predict_loop.state_dict(), | ||
} | ||
external_loop = getattr(self.trainer, "external_loop", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps trainer can have a property like one we have for the other loops?
@@ -186,7 +186,7 @@ def restore_callbacks(self) -> None: | |||
) | |||
self.trainer.on_load_checkpoint(self._loaded_checkpoint) | |||
|
|||
def restore_loops(self) -> None: | |||
def restore_loops(self, restore_external_loop: bool = False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably enough if this is controlled below by the existence of an external loop as checked below.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
@tchaton will this PR make this thru? |
Hey @turian, I am quite unsure. I see some advantages for an External Loop, but I believe it is maybe too much of a learning curve for new users. Best, |
@tchaton where is the quickstart version of how to use this? |
Hey @turian, Are you interested in ExternalLoop or KFold ? Best, |
@tchaton KFold. What are the use-cases for external loop? |
What does this PR do?
Fixes #839
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃