-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hydra configuration #2
Conversation
Any other ideas to add to the example? |
One idea is to separate dataclasses configuring core PL objects to a different module to make it clear those should be reused somehow and not copied by every single user. |
|
||
@dataclass | ||
class LightningTrainerConf: | ||
# callbacks: Optional[List[Callback]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are ready to start tackling left-over todos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this one, I feel we would need some user help. For example, how do we support the list of callback objects? 1) A user would need to define a structured config for their callback, 2) They would need to create a yaml list of these configs, 3) We would need to instantiate their object and populate the list, 4) Pass this into trainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the callback from the example:
class MyPrintingCallback(Callback):
def on_init_start(self, trainer):
print('Starting to init trainer!')
def on_init_end(self, trainer):
print('trainer is init now')
def on_train_end(self, trainer, pl_module):
print('do something when training ends')
This style lets you reuse the callback configs more easily:
callbacks:
print:
cls: MyPrintingCallback
s3_checkpoint:
cls: S3Checkpoint
params:
bucket_name: ???
callbacks_list:
- ${callbacks.print}
- ${callbacks.checkpoint}
The code can instantiate the callbacks like:
callbacks = [hydra.utils.instantiate(callback) for callback in cfg.callbacks_list]
Currently, Config group are mutually exclusive: you can only load one config from each config group.
Once facebookresearch/hydra#499 is done we will be able to do something better.
For now you the example can put all the callbacks config in the primary config file or break them into callbacks.yaml that can be added to the defaults list as - callbacks
.
PS: I did not try this so there may be unforeseen problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about interpolation, when I define callbacks: ${callbbacks.print}, then this resolves to: {'cls': 'MyPrintingCallback'}. Although when I put callbacks into a list format it resolves to {'callbacks': ['${callbacks.print}']}. Is there anything special you need to do for variable interpolation in a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I figured out the above issue. The solution you mentioned did not work, I was getting an error regarding incorrect type when trying to use the object: cfg.callbacks_list[0]. I put up a working solution in the most recent commit by just listing out the names of these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error? can you be more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I fixed the example and now it works like your example.
One idea would be to place it in: pytorch-lightning/pytorch_lightning/trainer/ where we can then reuse the hydra configuration |
Also quick aside, do you know much about the submitit plugin? In terms of how environment setup works for running with shared environments? |
Yes, but let's not use this important issue for unrelated discussions. |
The leftover todos relate to Union types, which have been created as Any or the more generic of the two types. |
How would you like to handle your comment in regards to separating this example into a part to be included and versioned with PL? Maybe, I misunderstood the comment previously, but my intention here was just putting trainer conf into the core folder for merging to the PL. |
I think all (most?) of the dataclasses belongs in the core:
we can condition the registration with Hydra's ConfigStore on the presence of Hydra. One think I don't like about this soft dependency is that it makes it hard to require a specific version of Hydra. I guess you could check hydra.version at runtime but it's not great. |
I agree with most of your comments in regards to moving things into core, my only hesitation is that users can define multiple optimizers which each use a different scheduler. At the moment, I am unsure given the config store api to be able to easily reuse this optimizer to a different group. |
I am not sure I follow your association between optimizer and (LR) scheduler. |
I did not explain that too well, basically I wanted to highlight the fact that you could have multiple optimizers/schedulers in a single configuration file. Since, at the moment multiple configurations are not supported for a single group, this would be an issue. Although, for most cases users should be fine with one optimizer at the moment. They can extend this template given for more advanced cases. |
can you add a second example that is reusing the code here to do something useful (train mnist for example)? |
For the sake of testing, is there a way for me to use hydra's compositional config feature without having hydra take control of the logger, output directory, etc.? For example, one current inconvenience is that each time I run the |
You can use the compose API, but you are forfeiting many important features. this is definitely not the recommended example for people to follow for something like this. |
@romesc-CMU I thought i fixed the redownload issue in a recent commit. Can you pull and try again. |
Ok, let me repull and check it out again. Maybe I was configuring incorrectly, since I did modify a few things to simplify testing. |
In regards to creating a 2nd example, at the moment this example is training MNIST. |
I'm on the most up to date commit and this fixed the dataset I/O issue 👍 . So far while testing
I read up on the structured configs and the Config Stores are making a lot more sense. I'm also understanding the purpose of
Does that imply this entire example is achievable with the earlier version of hydra that doesn't include structured configs? The main advantage to having them being the type checking? [which I do like] |
Without structured configs you achieve that with yaml config files. Please go through the basic Hydra tutorial to get a handle of the basic functionality. |
@romesc-CMU, Bt the way - did you successfully made any command line, config and runtime config access errors to see what happens? :) |
31eaf06
to
50232e0
Compare
One problem still to resolve is that PL flattens all the hparams passed into the model and tries to log them to the logger defined. A new issue is that when I pass these parameters some are missing and this results in exception. Specifically, since I defined target instead of cls, I get the error missing cls when the logger tries to flatten all the config settings. |
Also, we are having a problem after rebasing with the Tensorboard logger. This relates to this issue: Lightning-AI#2519. The problem is that tensorboard is trying to serialize objects, but the case used to determine if the object is a Dict Config is never hit. The reason being that the hparams object stores a dict of the parameters passed into the model, hence the case is never true that the hparams object is a Container type. https://github.com/PyTorchLightning/pytorch-lightning/blob/9759491940c4108ac8ef01e0b53b31f03a69b4d6/pytorch_lightning/core/saving.py#L364 |
Final issue is with None objects facebookresearch/hydra#785. I am also experiencing this and the work around suggested does not work. |
I am going to look at this one today. |
None bug fixed in example! |
Okay, I removed the double define of cls after the removal of the cls field in the recent hydra MR. The only issue left is on the parameter saving for tensorboard, which already has an issue and MR fix in the works. I am going to send this MR over to the PL team for initial discussions. |
MR? |
merge request I guess? |
fix job name template change to model create hydra examples folder fix error with none values optimizers and lr schedules clean up model structure model has data included dont configure outputs document hydra example update readme rename trainer conf scheduler example schedulers update change out structure for opt and sched flatten config dirs reduce number of classes scheduler and opt configs spelling change group config store location change import and store structured conf remaining classes fix for date change location of trainer config fix package name trainer instantiation clean up init trainer type fixes clean up imports update readme add in seed Update pl_examples/hydra_examples/README.md Co-authored-by: Omry Yadan <[email protected]> Update pl_examples/hydra_examples/README.md Co-authored-by: Omry Yadan <[email protected]> change to model clean up hydra example data to absolute path update file name fix path isort run name change hydra logging change config dir use name as logging group load configs in init py callout callbacks fix callbacks empty list example param data params example with two other data classes fix saving params dataset path correction comments in trainer conf logic in user app better config clean up arguments multiprocessing handled by PL settings cleaner callback list callback clean up top level config wip user config add in callbacks fix callbacks in user config fix group names name config fix user config instantiation without + change type split for readability user config move master config yaml hydra from master changes remove init py clean up model configuration add comments add to readme function doc need hydra for instantiate defaults defined in config yaml remove to do lines issue note remove imports unused cfg init removal double define instantiate changes change back to full config Update pl_examples/hydra_examples/pl_template.py Co-authored-by: Omry Yadan <[email protected]> Revert "double define" This reverts commit 4a9a962. fix data configuration remove bug comment, fixed already fix callbacks instantiate
This directory consists of an example of configuring Pytorch Lightning with [Hydra](https://hydra.cc/). Hydra is a tool that allows for the easy configuration of complex applications. | ||
The core of this directory consists of a set of structured configs used for pytorch lightining, which are stored under the `from pytorch_lightning.trainer.trainer_conf import PLConfig`. Within the PL config there are 5 cofigurations: 1) Trainer Configuration, 2) Profiler Configuration, 3) Early Stopping Configuration, 4) Logger Configuration and 5) Checkpoint Configuration. All of these are basically mirrors of the arguments that make up these objects. These configuration are used to instantiate the objects using Hydras instantiation utility. | ||
This directory consists of an example of configuring Pytorch Lightning with [Hydra](https://hydra.cc/). Hydra is a tool that allows for the easy configuration of complex applications. | ||
The core of this directory consists of a set of structured configs used for pytorch lightining, which are stored under the `from pytorch_lightning.trainer.trainer_conf import PLConfig`. Within the PL config there are 5 cofigurations: 1) Trainer Configuration, 2) Profiler Configuration, 3) Early Stopping Configuration, 4) Logger Configuration and 5) Checkpoint Configuration. All of these are basically mirrors of the arguments that make up these objects. These configuration are used to instantiate the objects using Hydras instantiation utility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please capitalize Structured Configs too to be consistent with how it's used in the documentation of Hydra.
@@ -1,13 +1,13 @@ | |||
## Hydra Pytorch Lightning Example | |||
|
|||
This directory consists of an example of configuring Pytorch Lightning with [Hydra](https://hydra.cc/). Hydra is a tool that allows for the easy configuration of complex applications. | |||
The core of this directory consists of a set of structured configs used for pytorch lightining, which are stored under the `from pytorch_lightning.trainer.trainer_conf import PLConfig`. Within the PL config there are 5 cofigurations: 1) Trainer Configuration, 2) Profiler Configuration, 3) Early Stopping Configuration, 4) Logger Configuration and 5) Checkpoint Configuration. All of these are basically mirrors of the arguments that make up these objects. These configuration are used to instantiate the objects using Hydras instantiation utility. | |||
This directory consists of an example of configuring Pytorch Lightning with [Hydra](https://hydra.cc/). Hydra is a tool that allows for the easy configuration of complex applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory consists of an example of configuring Pytorch Lightning with [Hydra](https://hydra.cc/). Hydra is a tool that allows for the easy configuration of complex applications. | |
This directory consists of an example of configuring Pytorch Lightning with [Hydra](https://hydra.cc/). Hydra is a framework that allows for the easy configuration of complex applications. |
|
||
Aside from the PyTorch Lightning configuration we have included a few other important configurations. Optimizer and Scheduler are easy off-the-shelf configurations for configuring your optimizer and learning rate scheduler. You can add them to your config defaults list as needed and use them to configure these objects. Additionally, we provide the arch and data configurations for changing model and data hyperparameters. | ||
Aside from the PyTorch Lightning configuration we have included a few other important configurations. Optimizer and Scheduler are easy off-the-shelf configurations for configuring your optimizer and learning rate scheduler. You can add them to your config defaults list as needed and use them to configure these objects. Additionally, we provide the arch and data configurations for changing model and data hyperparameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these configurations are adopted into the core of PL this should move into the PR description and out of the README.
Hi! Thanks for a wonderful example! I met a problem when I try to run |
@rakhimovv, it would help it you show how it fails. |
produces
If I understand correctly, the problem is that data split initialization should happen in def setup(self, stage) not in def prepare_data(self). But even if I copy-paste the code from def prepare_data(self) to def setup(self, stage) in hydra_config_model.py sometimes it works and sometimes it fails with the error attached below. The possible reason, I suppose, is that several processes try to download and write data to the same folder. I assume this is because value interpolation does not work in ddp regime. As the datasets are saved into experiment_folder/datasets, not into ${hydra:runtime.cwd}/datasets
p.s.
works fine, but it uses ddp_spawn regime, not ddp |
Thanks for reporting. You can follow along here. |
@rakhimovv I have noticed the above issue before and as you mentioned, changing to use setup is important. Additionally, I would just hardcode the data path for now until we work through a fix that properly sorts all this out. |
@anthonytec2 so what is the desired solution? We faced the same issue and at the end we overrode the |
Terrible hack :) If you are using PL directly, you should wait for a fix there because it's the one that is spawning the process. |
Hydra Configuration