feat: add StandardState #32

hollandjg · 2023-07-12T13:45:57Z

Description

Add a new StandardState object which can be used as the default for future experimentalists, experiment runners and theorists.

resolves feat: provide standard State with variable collection, params, conditions, experimental_data and models #24
resolves feat: add support for np.arrays and np.recarrays in functional approach #22

Type of change

fix: A bug fix
feat: A new feature

Features

Add a new StandardState object with

variables
experiment_data
conditions
models
alias model which returns the last model.

Make some fixes and refactors in the autora.state.delta module.

Don't throw an error if a delta field is missing on the state, supersedes fix: remove error when a delta-field is missing on the state #30
Allow State to coerce datatypes to the correct type with the metadata["converter"] parameter, supersedes feat: add ability to coerce datatypes in the state object on a Delta update #27
Allow State to declare aliases which convert a particular Delta parameter into a format understood by the State, supersedes feat: add "aliases" to State fields #31

Questions (Optional)

Are the field names understandable?
Do they match the agreed names?
Are there better names we could use?

Details

Aliases

Aliases work like this:

s = SomeState(models=[FittedModel0(), FittedModel1()]) # `models` is the list of models
t = s + Delta(model=FittedModel2())  # !! `model` is singular and not in a list!

...and get the following back as t:

SomeState(models=[FittedModel0(), FittedModel1(), FittedModel2()])

This is required for our Theorist interface which may by default return a single model:

def a_theorist(experiment_data):
    X, y = experiment_data[x_names], experiment_data[y_names]
    model = ModelFitter().fit(X, y)
    return Result(model=model)

Without this feature, this couldn't be handled without requiring each theorist to always return a list of models, which is a pain and feels wrong: return Result(models=[model])

… the constructor

Co-authored-by: benwandrew <[email protected]>

…state # Conflicts: # src/autora/state/delta.py

…state

# Conflicts: # src/autora/state/delta.py

…state-from-main

…rom-main

…es new each time through the field cycle

younesStrittmatter

Looks great!

src/autora/state/bundled.py

benwandrew

this is good, cool stuff! my only hesitation so far regarding names is models vs model (and more generally, things (full list) vs thing (last element); can we somehow more clearly differentiate the last model and the List?

src/autora/state/bundled.py

benwandrew · 2023-07-13T13:18:12Z

src/autora/state/bundled.py

+        >>> (s + dm1 + dm2).models
+        [DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)]
+
+        The last model is available under the `model` property:


i wonder if want a clearer distinction between the models and model properties? could get a little confusing for users since they're so close in name... although, admittedly hard to think of an alternative that keeps single words

Hmm... how about model vs. model_list or model_set? Are those better than models?

@musslick i'm fine with either of those! i'm also fine with models and model if others think it's not actually going to be an issue; just trying to maximally clear while still being Pythonic :)

On reflection, I feel like I still prefer models rather than anything longer – it feels a bit more natural to me. Combined with the type annotations (like BaseEstimator vs List[BaseEstimator]) and type checking, I feel like the risk of accidental confusion is sufficiently small to be unproblematic. I'm open to persuasion there though.

sounds good, i'm happy with models! it was only a slight hesitation on my part, and i think it is indeed quite natural.

src/autora/state/delta.py

benwandrew · 2023-07-13T18:03:45Z

a few more comments:

initially i was confused by some of the output displayed in Creating Generators With State Based Functions section of the notebook. specifically,

seemed like the outputs were reversed with respect to what was being described. i think the issue is that the output is correct the first pass through running the cells (as i verified), but then if you got back up and run a cell above again it will keep cycling but the descriptive text remains the same. no substantive error, just threw me off given what was displayed when you first open and read through the notebook.

in the subsequent section — Adding The Experimentalist — i encounter the following when running the first cell:

note, everything up to that point executed successfully.

musslick

Looks great, just had a question regarding numpy<>pandas conversion.

musslick · 2023-07-15T15:37:19Z

src/autora/state/delta.py

+        ...                            metadata={"delta": "replace",
+        ...                                      "converter": np.asarray})
+
+        Here we pass a dataframe, but expect a numpy array:


Quick question: Do we also allow for casting from pandas to numpy? I think we should allow that since most people will be feeding the theorists with numpy array (it's just simpler).

Yes! Casting from pandas to numpy is totally possible – using the np.asarray converter like on line 170, if the Delta includes a DataFrame, it will be converted to a numpy array.

You can always put a pd.DataFrame through np.asarray to get a numpy array.

My recommendation would be to have the functions and classes which want to represent the data internally as numpy arrays accept np.typing.ArrayLike as the input type, which allows for lots of input types – lists of values, or DataFrames or np.ndarrays – and then do a np.asarray() call at the top of the function to turn the input into the array you want.

It's really hard to do the casting properly from outside these functions, as we'd have to inspect the function signatures and work out whether the type we have is compatible. It's much simpler and more efficient to do this within the function itself, especially if we have a general interchange format like pd.DataFrame as our standard.

musslick · 2023-07-15T15:38:36Z

src/autora/state/bundled.py

+        >>> (s + dm1 + dm2).models
+        [DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)]
+
+        The last model is available under the `model` property:


Hmm... how about model vs. model_list or model_set? Are those better than models?

Co-authored-by: benwandrew <[email protected]>

…to feat/default-state-from-main

hollandjg · 2023-07-17T13:36:10Z

seemed like the outputs were reversed with respect to what was being described

They were. Thanks for catching this! It should be in order now. Using generators is tricky, perhaps it's not something we should really recommend.

hollandjg · 2023-07-17T13:37:05Z

2. in the subsequent section — Adding The Experimentalist — i encounter the following when running the first cell:

This is because the new experimentalist isn't yet defined – that comes in #33 . I've taken out that block from this PR but reintroduce it (in a fixed form) in #32.

benwandrew · 2023-07-17T16:19:50Z

in the subsequent section — Adding The Experimentalist — i encounter the following when running the first cell:

This is because the new experimentalist isn't yet defined – that comes in #33 . I've taken out that block from this PR but reintroduce it (in a fixed form) in #32.

ok, that makes sense.

benwandrew

per the end of our conversation in group today, and with all previous requested changes/questions addressed, i'm happy to get this merged!

benwandrew · 2023-07-19T19:57:26Z

src/autora/state/bundled.py

+        >>> (s + dm1 + dm2).models
+        [DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)]
+
+        The last model is available under the `model` property:


sounds good, i'm happy with models! it was only a slight hesitation on my part, and i think it is indeed quite natural.

ci: update nb-clean hook to remove empty cells

hollandjg and others added 25 commits July 10, 2023 11:32

feat: add conversion between datatypes

426b984

revert: unnecessary reordering

86ae8d2

chore: simplify code– we don't need an additional call – we just call…

5cd730c

… the constructor

Update src/autora/state/delta.py

cf72273

Co-authored-by: benwandrew <[email protected]>

Update src/autora/state/delta.py

b112089

Co-authored-by: benwandrew <[email protected]>

feat: add initial bundled state

19f0ec6

feat: add field aliases to State

64877af

feat: add field aliases to State

a85658a

fix: remove error when a Delta-field doesn't exist on the State

f0cb107

Merge branch 'fix/no-error-on-missing-state-field' into feat/casting-…

c107d02

…state # Conflicts: # src/autora/state/delta.py

revert: adding coercion

674d487

docs: update docstring

f535160

Merge branch 'fix/no-error-on-missing-state-field' into feat/casting-…

8864ee4

…state

feat: reaplly coercion code

3cbda1f

Merge branch 'feat/casting-state' into feat/state-attribute-aliases

e65d60f

# Conflicts: # src/autora/state/delta.py

Merge branch 'fix/no-error-on-missing-state-field' into feat/default-…

f591325

…state-from-main

Merge branch 'feat/casting-state' into feat/default-state-from-main

abfc5b2

Merge branch 'feat/state-attribute-aliases' into feat/default-state-f…

f73bc0f

…rom-main

fix: remove error on field aliases caused by not initializing variabl…

739e93c

…es new each time through the field cycle

fix: add support for None in delta module (field extension)

06dd81c

fix: make coercion happen for all datatypes before applying the delta

a814062

feat: update BasicAERState docstring

7daa9b8

refactor: rename BasicAERState to StandardState

006c7bd

test: rename BasicAERState to StandardState

8bb9854

docs: add basic introduction which uses the StandardState

775138f

This was referenced Jul 12, 2023

fix: remove error when a delta-field is missing on the state #30

Closed

feat: add ability to coerce datatypes in the state object on a Delta update #27

Closed

feat: add default state and a more basic introduction #29

Closed

hollandjg added 2 commits July 12, 2023 10:01

refactor: move append to behind the extend declarations

453816b

docs: update docstring

7216b03

hollandjg requested review from younesStrittmatter and gtdang July 12, 2023 15:58

hollandjg added 5 commits July 12, 2023 16:53

fix: if there is no model available, return None

93d3ee2

docs: add explanation on wrapper.

b07ea2e

test: add doctests to append function

d14c2a5

test: add additional doctest to delta append

1c2bb25

test: add doctests for _get_value

0928af2

hollandjg marked this pull request as ready for review July 12, 2023 21:27

younesStrittmatter approved these changes Jul 13, 2023

View reviewed changes

benwandrew reviewed Jul 13, 2023

View reviewed changes

src/autora/state/bundled.py Show resolved Hide resolved

benwandrew requested changes Jul 13, 2023

View reviewed changes

benwandrew reviewed Jul 13, 2023

View reviewed changes

src/autora/state/delta.py Show resolved Hide resolved

docs: code vs text formatting

e6236b8

musslick reviewed Jul 15, 2023

View reviewed changes

hollandjg and others added 4 commits July 17, 2023 09:08

test: update variable names

aad1e9d

Co-authored-by: benwandrew <[email protected]>

docs: fix indentation

b8e5bef

Merge remote-tracking branch 'origin/feat/default-state-from-main' in…

d84debd

…to feat/default-state-from-main

docs: remove broken bit of example code

3491c66

hollandjg requested a review from benwandrew July 17, 2023 13:47

benwandrew approved these changes Jul 19, 2023

View reviewed changes

Merge branch 'main' into feat/default-state-from-main

ea4fd52

hollandjg added this pull request to the merge queue Jul 21, 2023

Merged via the queue into main with commit 689e192 Jul 21, 2023
13 checks passed

hollandjg deleted the feat/default-state-from-main branch July 21, 2023 15:06

hollandjg added a commit that referenced this pull request Nov 29, 2023

Merge pull request #32 from AutoResearch/ci-nb-clean-remove-empty-cells

efcb606

ci: update nb-clean hook to remove empty cells

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add StandardState #32

feat: add StandardState #32

hollandjg commented Jul 12, 2023 •

edited

Loading

younesStrittmatter left a comment

benwandrew left a comment •

edited

Loading

benwandrew Jul 13, 2023

musslick Jul 15, 2023

benwandrew Jul 17, 2023

hollandjg Jul 18, 2023

benwandrew Jul 19, 2023

benwandrew commented Jul 13, 2023 •

edited

Loading

musslick left a comment

musslick Jul 15, 2023

hollandjg Jul 17, 2023 •

edited

Loading

musslick Jul 15, 2023

hollandjg commented Jul 17, 2023

hollandjg commented Jul 17, 2023

benwandrew commented Jul 17, 2023

benwandrew left a comment

benwandrew Jul 19, 2023

feat: add StandardState #32

feat: add StandardState #32

Conversation

hollandjg commented Jul 12, 2023 • edited Loading

Description

Type of change

Features

Questions (Optional)

Details

Aliases

younesStrittmatter left a comment

Choose a reason for hiding this comment

benwandrew left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwandrew commented Jul 13, 2023 • edited Loading

musslick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hollandjg Jul 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hollandjg commented Jul 17, 2023

hollandjg commented Jul 17, 2023

benwandrew commented Jul 17, 2023

benwandrew left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hollandjg commented Jul 12, 2023 •

edited

Loading

benwandrew left a comment •

edited

Loading

benwandrew commented Jul 13, 2023 •

edited

Loading

hollandjg Jul 17, 2023 •

edited

Loading