feat!: update experimentalists `grid_pool`, `random_pool`, `random_sample` to use the new `State` mechanism #33

hollandjg · 2023-07-12T14:22:01Z

Updates:

Breaking Change, Refactor: remove structural pooler vs. sampler split in autora.experimentalist
Breaking Change, Feature: Update standard way of setting up experimentalists so that every function should be wrapped by the end-user, reducing load on contributors and hopefully making reuse easier
Breaking Change, Refactor: Gather standard State and wrapper functions in a single submodule to allow from autora.state.standard import StandardState, Delta, on_state, state_fn_from_estimator
Breaking Change, Feature: convert existing example experimentalists grid_pool, random_pool, and random_sample to use the State + Delta mechanism with the standard field names from StandardState
CI: Update Pre-Commit Hooks

Resolutions:

Discussion – moving poolers/samplers to top-level of `experimentalist`

There are several reasons for moving the poolers and samplers into files at the top level of experimentalist:

`poolers` and `samplers` are now less relevant

The strict difference between poolers and samplers now has less meaning

Conceptually: Now we just have functions which sit in a pipeline or cycle and operate on the state. Some of them still act like poolers and samplers of old, but they look more similar.

Technically: In the old setup, poolers took no positional arguments by design, whereas samplers took one (conditions). Now all of the poolers and samplers might take any number of arguments, like the experiment_data, model, conditions from a step before, and we achieve that using only named arguments in the functions which act on State.

It makes sense to gather related functions in one file

Some of the poolers and samplers are obviously related – like the random pooler/sampler, falsification pooler/sampler.

It is simpler to have these in the same file, so that you can share functions more easily between them.

This will also remove one level of complexity for contributors – they'll just be able to make an experimentalist with full freedom to use whatever they want.

This will also remove one chunk of complexity from the cookiecutter template.

Allows for backward compatibility whilst freeing up the good names we want

There are two things we really want to achieve here:

no breaking changes in this release
free up names like grid_pool, random_sample, random_pool for our new paradigm

The proposed method was to make the existing functions into singledispatch functions which had different behavior depending on what the inputs were. However, this relies on there always being at least one positional argument to the function, an assumption which is broken for the old poolers which had no positional arguments by design.

An attempt was made to remedy this, but it was super complicated and didn't work because of the positional arguments thing.

Both in this case, and in future cases where we'll want to update all of the existing contributions to use the new setup, it's just much easier to start over in a new file than to try to make it work in the same file.

…ave two separate functions

…a pipeline

…rom-main-experimentalists

This reverts commit 25e24f8.

…rom-main-experimentalists # Conflicts: # docs/cycle/Linear and Cyclical Workflows using Functions and States.ipynb

hollandjg · 2023-08-22T12:33:27Z

Hi all, esp @musslick @younesStrittmatter @benwandrew @blinodelka @chadcwilliams @benwandrew
I've updated this PR to include the core changes from the mini-hackathon. I think this might be ready to merge, so please let me know if there's something broken.

If anything is missing, I suggest we make that a new PR, because this one is pretty large and already does a lot of stuff.

I didn't include the align_dataframe_to_ivs function from https://github.com/AutoResearch/autora-core/blob/mini-hackaton-before-refactor-state/src/autora/utils/conversion.py because I don't think this should be part of the on_state function.
- The functions in autora.state.delta are by design completely agnostic about the contents of the state objects, other than assuming that they can be treated as State objects and are fundamentally dataclasses. That means that they don't do any transformations on the data beyond extending DataFrames and renaming things according to the alias rules.
- I think it's best to include the align_dataframe_to_ivs in the places where you need it in your experimentalist, for instance, or to define a new function wrapper which you can use if you need it.
- Remember that if you're using data frames you can just pass a list of column names and you've already got exactly those columns in that order, and I think that's probably more understandable to the end-user.

musslick · 2023-08-22T16:38:01Z

Hi all, esp @musslick @younesStrittmatter @benwandrew @blinodelka @chadcwilliams @benwandrew I've updated this PR to include the core changes from the mini-hackathon. I think this might be ready to merge, so please let me know if there's something broken.

If anything is missing, I suggest we make that a new PR, because this one is pretty large and already does a lot of stuff.

I didn't include the align_dataframe_to_ivs function from https://github.com/AutoResearch/autora-core/blob/mini-hackaton-before-refactor-state/src/autora/utils/conversion.py because I don't think this should be part of the on_state function.

The functions in autora.state.delta are by design completely agnostic about the contents of the state objects, other than assuming that they can be treated as State objects and are fundamentally dataclasses. That means that they don't do any transformations on the data beyond extending DataFrames and renaming things according to the alias rules.

I think it's best to include the align_dataframe_to_ivs in the places where you need it in your experimentalist, for instance, or to define a new function wrapper which you can use if you need it.

Remember that if you're using data frames you can just pass a list of column names and you've already got exactly those columns in that order, and I think that's probably more understandable to the end-user.

So far, we have been using it in both experimentalists and synthetic models (autora). Any other place where we can put it in the core?

hollandjg · 2023-08-24T15:24:15Z

So far, we have been using it in both experimentalists and synthetic models (autora). Any other place where we can put it in the core?

I can't think of any.

hollandjg · 2023-08-24T15:58:17Z

Hi all, esp. @musslick @benwandrew and @younesStrittmatter – I'd appreciate your re-review of this PR! Please let me know if there's anything else needing changing.

src/autora/state.py

…y_fn_df

src/autora/state.py

musslick

Looks great but docstrings need to be fixed

musslick

Looks great!

Covered in today's meeting. Thanks, Ben!

test: fix broken tests (random_sample uses num_samples now)

hollandjg added 10 commits July 12, 2023 10:11

refactor: update docstrings and file ordering

230dc29

refactor: reorder random_pooler file

ba57826

refactor: reorganize random_pool to use pd.DataFrame as default and h…

af49e59

…ave two separate functions

refactor: remake random sampler to use a result object to be used in …

c206cc2

…a pipeline

test: update doctests to support windows

b735ce4

revert: changes to grid_pool function

96533ed

docs: update docstrings and tests to work

e1b2c54

revert: changes to random_sample function

8b91e48

docs: add explanation on wrapper.

3d08b87

rename: update executors to use a new naming convention

00d15d8

hollandjg changed the base branch from main to feat/default-state-from-main July 12, 2023 14:23

hollandjg changed the title ~~feat/default-state-from-main-experimentalists~~ feat!: rename grid_pool, random_pool, random_sample to use the new naming convention for experimentalists Jul 12, 2023

hollandjg changed the title ~~feat!: rename grid_pool, random_pool, random_sample to use the new naming convention for experimentalists~~ feat!: update grid_pool, random_pool, random_sample to use the new State mechanism Jul 12, 2023

hollandjg changed the title ~~feat!: update grid_pool, random_pool, random_sample to use the new State mechanism~~ feat!: update grid_pool, random_pool, random_sample to use the new State mechanism Jul 12, 2023

hollandjg mentioned this pull request Jul 12, 2023

feat: new idea for the "common" experimentalist interface – just use the State #28

Closed

hollandjg requested review from younesStrittmatter, musslick, benwandrew, chadcwilliams and TheLemonPig and removed request for younesStrittmatter and musslick July 12, 2023 14:31

hollandjg self-assigned this Jul 12, 2023

hollandjg added 5 commits July 12, 2023 10:36

Merge branch 'feat/default-state-from-main' into feat/default-state-f…

46975ef

…rom-main-experimentalists

Revert "docs: remove notebook which doesn't yet work"

6a57124

This reverts commit 25e24f8.

fix: if there is no model available, return None

38a5577

docs: update notebook to use new format

4824af9

Merge branch 'feat/default-state-from-main' into feat/default-state-f…

ceccc68

…rom-main-experimentalists # Conflicts: # docs/cycle/Linear and Cyclical Workflows using Functions and States.ipynb

hollandjg requested review from younesStrittmatter, blinodelka and whyhardt August 22, 2023 12:32

hollandjg marked this pull request as ready for review August 22, 2023 12:33

hollandjg mentioned this pull request Aug 22, 2023

Feat: general wrapper #26

Closed

hollandjg added 5 commits August 23, 2023 17:04

refactor: move all standard-state code into a single state.py file

7c5783e

refactor: move all standard-state code into a single state.py file

046aae4

refactor: make _extend and _append functions private

0cab123

refactor: update imports from autora.state

495631f

refactor: update imports from autora.state

247511e

hollandjg changed the title ~~feat!: update grid_pool, random_pool, random_sample to use the new State mechanism~~ feat!: update experimentalists grid_pool, random_pool, random_sample to use the new State mechanism Aug 23, 2023

refactor: make pytest use importlib mode, allowing duplicate filenames

e48b387

musslick reviewed Aug 25, 2023

View reviewed changes

src/autora/state.py Outdated Show resolved Hide resolved

musslick reviewed Aug 25, 2023

View reviewed changes

src/autora/state.py Outdated Show resolved Hide resolved

hollandjg added 2 commits August 25, 2023 17:00

refactor: rename estimator_on_state from state_fn_from_estimator

1a1c897

refactor: rename experiment_runner_on_state from state_fn_from_x_to_x…

982a2c2

…y_fn_df

musslick reviewed Aug 25, 2023

View reviewed changes

src/autora/state.py Outdated Show resolved Hide resolved

musslick requested changes Aug 25, 2023

View reviewed changes

hollandjg requested a review from musslick August 25, 2023 15:49

musslick approved these changes Aug 25, 2023

View reviewed changes

hollandjg added this pull request to the merge queue Aug 25, 2023

Merged via the queue into main with commit 2636a27 Aug 25, 2023
14 checks passed

hollandjg deleted the feat/default-state-from-main-experimentalists branch August 25, 2023 17:36

hollandjg added a commit that referenced this pull request Nov 29, 2023

Merge pull request #33 from AutoResearch/fix-broken-tests-random_sample

7622917

test: fix broken tests (random_sample uses num_samples now)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: update experimentalists `grid_pool`, `random_pool`, `random_sample` to use the new `State` mechanism #33

feat!: update experimentalists `grid_pool`, `random_pool`, `random_sample` to use the new `State` mechanism #33

hollandjg commented Jul 12, 2023 •

edited

Loading

hollandjg commented Aug 22, 2023

musslick commented Aug 22, 2023

hollandjg commented Aug 24, 2023

hollandjg commented Aug 24, 2023

musslick left a comment

musslick left a comment

feat!: update experimentalists grid_pool, random_pool, random_sample to use the new State mechanism #33

feat!: update experimentalists grid_pool, random_pool, random_sample to use the new State mechanism #33

Conversation

hollandjg commented Jul 12, 2023 • edited Loading

Discussion – moving poolers/samplers to top-level of experimentalist

poolers and samplers are now less relevant

It makes sense to gather related functions in one file

Allows for backward compatibility whilst freeing up the good names we want

hollandjg commented Aug 22, 2023

musslick commented Aug 22, 2023

hollandjg commented Aug 24, 2023

hollandjg commented Aug 24, 2023

musslick left a comment

Choose a reason for hiding this comment

musslick left a comment

Choose a reason for hiding this comment

feat!: update experimentalists `grid_pool`, `random_pool`, `random_sample` to use the new `State` mechanism #33

feat!: update experimentalists `grid_pool`, `random_pool`, `random_sample` to use the new `State` mechanism #33

hollandjg commented Jul 12, 2023 •

edited

Loading

Discussion – moving poolers/samplers to top-level of `experimentalist`

`poolers` and `samplers` are now less relevant