Feature/nominal dtype checks in transform #162

TommyMatthews · 2024-01-11T08:20:55Z

Addresses issue #145 in part. I was unable to replicate the error I found in the generic testing where the fit method did not raise an appropriate error for X not being a pandas.DataFrame object.

This bug definitely exists for the transform method so I have fixed that. super.transform was always getting called, but always after the check_mappable_rows method inherited from BaseNominalTransformer, which was making checks that assumed a dataframe. This is a good example of an implementation test (super transform call) not testing for the behaviour we want (error if X not pd dataframe).

In the case of MeanResponseTransformer I just moved the call of super.transform to the start of the transform method, however in OrdinalEncoderTransformer and NominalToIntegerTransformer this led to X not passing the check_mappable_rows check, so I just called the relevant columns_check method from BaseTransformer at the start of the transform method, which raises the appropriate error. I think the fact that I have had to do it like this reflects either a lack of understanding on my part or a need to refactor BaseNominalTransformer. I have added tests for this behaviour (copied from BaseTransformer test suite).

The issue still mentions the possibility of adding nominal type checks (maybe with a warning) for nominal transformers. I think probably best to get the bug fix in, close the issue and open a new issue for this feature.

… not a dataframe, also tested

…is not a dataframe, also tested

…un X is dataframe check in transform

davidhopkinson26 · 2024-01-11T09:23:20Z

Thanks @TommyMatthews. I agree this is all quite messy and could do with a further reaching refactor. Going forward I would like to see less reliance on super init/fit/transform calls and move towards the error handling being in more specific methods to be called in child classes (like you have done with the columns_check method here)

davidhopkinson26 · 2024-01-11T09:29:18Z

I don't understand why moving the super_transform call to the start broke check_mappable_rows. Can you elaborate on why it failed please?

I'm happy with the solution proposed (i.e. direct call to columns_check) but want to understand to help with future refactoring of BaseNominalTransformer as this seems like odd behaviour.

TommyMatthews · 2024-01-11T09:46:52Z

@davidhopkinson26 I'm not 100% sure myself. If I run X = super().transform(X), or even X = super(BaseNominalTransformer, self).transform(X) (to explicitly access the BaseTransformer) then the value error BaseNominalTransformer.check_mappable_rows is raised, e.g. ValueError: OrdinalEncoderTransformer: nulls would be introduced into column b from levels not present in mapping

So I guess mappable rows is now be calculated incorrectly? (I can't see it changing X.shape[0]?)

TommyMatthews · 2024-01-11T15:17:54Z

Figured this one out: hadn't fully understood the inheritance set up of the transformer but super.transform is actually calling the transform method of BaseMappingTransformerMixin, which is actually doing the mapping and hence if you call check_mappable_rows afterwards it fails

I think the best solution is still just calling the BaseTransformer.columns_check() as BaseTransformer.transform() is called in BaseMappingTransformerMixin.transform(), so calling this at the start of transform would duplicate a lot of checks

davidhopkinson26 · 2024-01-11T15:36:14Z

Figured this one out: hadn't fully understood the inheritance set up of the transformer but super.transform is actually calling the transform method of BaseMappingTransformerMixin, which is actually doing the mapping and hence if you call check_mappable_rows afterwards it fails

I think the best solution is still just calling the BaseTransformer.columns_check() as BaseTransformer.transform() is called in BaseMappingTransformerMixin.transform(), so calling this at the start of transform would duplicate a lot of checks

That makes sense. I agree that this is the best solution.

TommyMatthews added 4 commits January 10, 2024 15:20

updates mean response transformer: transform now raises error if X is…

946d3df

… not a dataframe, also tested

updates OrdinalEncoderTransformer: transform now raises error if X is…

be03fbe

… not a dataframe, also tested

updates NominalToIntegerTransformer: transform now raises error if X …

d6f2037

…is not a dataframe, also tested

updates nominal: calls just BaseTransformer columns check method to r…

5039e6b

…un X is dataframe check in transform

davidhopkinson26 approved these changes Jan 11, 2024

View reviewed changes

Update CHANGELOG.rst

a9fe276

davidhopkinson26 merged commit 2462950 into develop Jan 11, 2024
1 check passed

davidhopkinson26 deleted the feature/nominal_dtype_checks_in_transform branch January 11, 2024 15:50

davidhopkinson26 mentioned this pull request Jan 17, 2024

Additional V1.1.1 changes into main #164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/nominal dtype checks in transform #162

Feature/nominal dtype checks in transform #162

TommyMatthews commented Jan 11, 2024 •

edited

Loading

davidhopkinson26 commented Jan 11, 2024

davidhopkinson26 commented Jan 11, 2024

TommyMatthews commented Jan 11, 2024 •

edited

Loading

TommyMatthews commented Jan 11, 2024 •

edited

Loading

davidhopkinson26 commented Jan 11, 2024

Feature/nominal dtype checks in transform #162

Feature/nominal dtype checks in transform #162

Conversation

TommyMatthews commented Jan 11, 2024 • edited Loading

davidhopkinson26 commented Jan 11, 2024

davidhopkinson26 commented Jan 11, 2024

TommyMatthews commented Jan 11, 2024 • edited Loading

TommyMatthews commented Jan 11, 2024 • edited Loading

davidhopkinson26 commented Jan 11, 2024

TommyMatthews commented Jan 11, 2024 •

edited

Loading

TommyMatthews commented Jan 11, 2024 •

edited

Loading

TommyMatthews commented Jan 11, 2024 •

edited

Loading