Sandbox to learn and explore the Python library hydra-zen
Francisco Camargo
Will use this repo as a sandbox to experiment using hydra-zen
Want to learn how to use this for automated training and usage of models, but at the same time want to figure out the workflow for when I want to manually change and explore a model.
-
does hydra write down the config.yaml file first, and THEN it runs zen(task) on that config file?
- if so, not clear where/when the config.yaml file gets written down
- is there a way that I can have just a task function and a hand-made
config.yaml
, I dozen(task)
and everything works? ie. Do I have to define a config in code?
-
hydra store
- save something to the store in one
.py
file, and then use it in another - what the heck is going on in the
if __name__ == '__main__'
section of example code?- is this how the datasets and model classes actually get saved to the store?
- What's the deal with
version_base
?
-
hydra_zen.store()
vshydra_zen.ZenStore.add_to_hydra_store
- how do I clear the store?
- save something to the store in one
-
Tutorial: Design a Hierarchical Interface for an Application
- Run without decorator
- Run using
launch()
-
how do I experiment
- control which models to use
- control which hyperparameters to use
- control a GridSearchCV run
- control a run over a specified list of values for a hyperparameter
- control which data to use
- control what data to plot
- control which models to use
To view the same contents as what will appear in /outputs/*/*/.hydra/config.yaml
you can run
python my_app.py --info
where we have used the --info
flag
Can view the configurable aspects of our application using the --help
command
python my_app.py --help
This tells us the fields that the app requires
- Override Grammar: link and link
- CLI flags: link
- Defaults List: link
Running hydra applications, link
python my_app.py -cp outputs/2021-10-27-15-29-10/.hydra/ -cn config
-cp
or --config-path
allows for override of the path specified in hydra_main()
-cn
or --config-name
allows for override of the config name specified in hydra_main()
I am noticing that I can get this to work if in hydra_main()
I specify config_path
to be any string, doesn't matter what the value of the string is, just give config_path
a string value. Setting config_path=None
doesn't work, nor does excluding config_path
from the hydra_main()
settup.
Look at the top of file src/5_experiments/my_app0.py
for more details. For the moment it seems like the move is to set hydra_main(config_path="."
Starting to think that hydra_main()
looks for a config in the gloval store
that has the name of config_name
and also (simultaneously?) looks for a config in the folder specified by config_path
with the name config_name
. This means that strictly speaking, I don't need to have all the config building code if instead I happen to have a good config.yaml
file that I can point do. I point to it via the arguments of hydra_main()
. This means that I could choose to build config file by hand, write down the task function code, run hydra_zen()
, and I'd be good. It does make me wonder what impact, if any, the line
_target_: __main__.task
has on how the app runs; for the moment it seems like it's not needed...
How to maintain multiple configurations for an experiment, so that each experiment's config need only specify changes to some "master" config link
I think this is best suitable when the code is in a "finished" state where you don't need to keep playing with values that go into the config
Still trying to get my head around how the directory situation is/needs to be handled. Here is one example that has the hydra-zen
code in a subfolder yet the artifacts from having run the code are saved to /outputs/
or /multirun/
if __name__ == "__main__":
from hydra.conf import HydraConf, JobConf
# Configure Hydra to change the working dir to
# match that of the output dir
store(HydraConf(job=JobConf(chdir=True)), name="config", group="hydra")
Reference from Hydra
hello
hydra_zen.make_config
hydra_zen.builds()
- Implicitly via
@store
decorator hydra_zen.make_custom_builds_fn()
There are multiple ways to end up with configs
- Without the
store()
- With the
store()
,- Add config to
store()
using a function decorator, link- How to I access the config so that I can use
launch()
? At the moment it seems like if I use the decorator, I have to run the app via the CLI
- How to I access the config so that I can use
- Add config to
store()
while using groups and then add "parent" config withstore(make_config())
, link - Always end with
store.add_to_hydra_store()
(which adds local store to global store)?
- Add config to
There are multiple ways to launch an application
- Execute
launch()
, link. Feed the function a config and a task function that takes as input a config - Execute
launch()
, link. Feed the function a config and azen()
wrapped task function - From CLI
- Use
hydra_main()
method of azen()
wrapped task function to enable CLI usage, link
- Use
Following this guide.
To run all experiments via CLI:
python src/scikit_learn_howto/my_app.py "dataset=glob(*)" "classifier=glob(*)" --multirun
seems like the --multirun
option is needed to enable the glob(*)
syntax
To run plotter code in CLI
python src/scikit_learn_howto/plotter.py
These two scripts are meant to be run in tandem, that is, if you run the experiment code multiple times, the plotter will likely not work. In this case, the easy fix is to delete the multirun folder and start the experiments over again.
If successful, you should see the following plot:
Here I will make changes to the scikit_learn_howto from the previous section
To run a single combination of data and classifier, can do the following:
python src/scikit_learn_fc/my_app_fc.py "dataset=moons" "classifier=knn"
where I have chosen the moons data and the knn classifier. Note that this will put the results into an /outputs/
folder