Skip to content

francisco-camargo/learn_hydra_zen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learn hydra-zen

Sandbox to learn and explore the Python library hydra-zen

Francisco Camargo

Learn Hydra Zen

Will use this repo as a sandbox to experiment using hydra-zen

Want to learn how to use this for automated training and usage of models, but at the same time want to figure out the workflow for when I want to manually change and explore a model.

Open questions

  • does hydra write down the config.yaml file first, and THEN it runs zen(task) on that config file?

    • if so, not clear where/when the config.yaml file gets written down
    • is there a way that I can have just a task function and a hand-made config.yaml, I do zen(task) and everything works? ie. Do I have to define a config in code?
  • hydra store

    • save something to the store in one .py file, and then use it in another
    • what the heck is going on in the if __name__ == '__main__' section of example code?
      • is this how the datasets and model classes actually get saved to the store?
      • What's the deal with version_base?
    • hydra_zen.store() vs hydra_zen.ZenStore.add_to_hydra_store
    • how do I clear the store?
  • Tutorial: Design a Hierarchical Interface for an Application

    • Run without decorator
    • Run using launch()
  • how do I experiment

    • control which models to use
      • control which hyperparameters to use
      • control a GridSearchCV run
      • control a run over a specified list of values for a hyperparameter
    • control which data to use
    • control what data to plot

View config info via CLI

To view the same contents as what will appear in /outputs/*/*/.hydra/config.yaml you can run

python my_app.py --info

where we have used the --info flag

Can view the configurable aspects of our application using the --help command

python my_app.py --help

This tells us the fields that the app requires

Hydra CLI

  • Override Grammar: link and link
  • CLI flags: link
  • Defaults List: link

Rerun run via CLI

Running hydra applications, link

python my_app.py -cp outputs/2021-10-27-15-29-10/.hydra/ -cn config

-cp or --config-path allows for override of the path specified in hydra_main() -cn or --config-name allows for override of the config name specified in hydra_main()

Trying to understand how this works

I am noticing that I can get this to work if in hydra_main() I specify config_path to be any string, doesn't matter what the value of the string is, just give config_path a string value. Setting config_path=None doesn't work, nor does excluding config_path from the hydra_main() settup.

Look at the top of file src/5_experiments/my_app0.py for more details. For the moment it seems like the move is to set hydra_main(config_path="."

Starting to think that hydra_main() looks for a config in the gloval store that has the name of config_name and also (simultaneously?) looks for a config in the folder specified by config_path with the name config_name. This means that strictly speaking, I don't need to have all the config building code if instead I happen to have a good config.yaml file that I can point do. I point to it via the arguments of hydra_main(). This means that I could choose to build config file by hand, write down the task function code, run hydra_zen(), and I'd be good. It does make me wonder what impact, if any, the line

_target_: __main__.task

has on how the app runs; for the moment it seems like it's not needed...

Configure Experiments

How to maintain multiple configurations for an experiment, so that each experiment's config need only specify changes to some "master" config link

I think this is best suitable when the code is in a "finished" state where you don't need to keep playing with values that go into the config

Working directory

Still trying to get my head around how the directory situation is/needs to be handled. Here is one example that has the hydra-zen code in a subfolder yet the artifacts from having run the code are saved to /outputs/ or /multirun/

if __name__ == "__main__":  
    from hydra.conf import HydraConf, JobConf
    # Configure Hydra to change the working dir to
    # match that of the output dir
    store(HydraConf(job=JobConf(chdir=True)), name="config", group="hydra")

Reference from Hydra

Making configs

hello

  1. hydra_zen.make_config
  2. hydra_zen.builds()
  3. Implicitly via @store decorator
  4. hydra_zen.make_custom_builds_fn()

Learning to use hydra_zen.store()

docs

There are multiple ways to end up with configs

  1. Without the store()
    1. Use make_config() which is then fed to launch(), link
    2. Use builds() on a function or object, and is then fed to launch(), link
  2. With the store(),
    1. Add config to store() using a function decorator, link
      1. How to I access the config so that I can use launch()? At the moment it seems like if I use the decorator, I have to run the app via the CLI
    2. Add config to store() while using groups and then add "parent" config with store(make_config()), link
    3. Always end with store.add_to_hydra_store() (which adds local store to global store)?

Launching application

There are multiple ways to launch an application

  1. Execute launch(), link. Feed the function a config and a task function that takes as input a config
  2. Execute launch(), link. Feed the function a config and a zen() wrapped task function
  3. From CLI
    1. Use hydra_main() method of a zen() wrapped task function to enable CLI usage, link

scikit_learn_howto

Following this guide.

To run all experiments via CLI:

python src/scikit_learn_howto/my_app.py "dataset=glob(*)" "classifier=glob(*)" --multirun

seems like the --multirun option is needed to enable the glob(*) syntax

To run plotter code in CLI

python src/scikit_learn_howto/plotter.py

These two scripts are meant to be run in tandem, that is, if you run the experiment code multiple times, the plotter will likely not work. In this case, the easy fix is to delete the multirun folder and start the experiments over again.

If successful, you should see the following plot:

1675139723253

scikit_learn_fc

Here I will make changes to the scikit_learn_howto from the previous section

To run a single combination of data and classifier, can do the following:

python src/scikit_learn_fc/my_app_fc.py "dataset=moons" "classifier=knn"

where I have chosen the moons data and the knn classifier. Note that this will put the results into an /outputs/ folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages