Skip to content

Commit

Permalink
merged in main
Browse files Browse the repository at this point in the history
  • Loading branch information
limlam96 committed May 15, 2024
2 parents 3786911 + f769001 commit 62f49eb
Show file tree
Hide file tree
Showing 21 changed files with 733 additions and 820 deletions.
18 changes: 11 additions & 7 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,11 @@ Added
- Inheritable tests for generic base behaviours for base transformer in `base_tests.py`, with fixtures to allow for this in `conftest.py`
- Split existing input check into two better defined checks for TwoColumnOperatorTransformer `#183 <https://github.com/lvgig/tubular/pull/183>`_
- Created unit tests for checking column type and size `#183 <https://github.com/lvgig/tubular/pull/183>`_
- Automated weights column checks through a mixin class and captured common weight tests in generic test classes for weighted transformers

Changed
^^^^^^^
- Standardised naming of weight arg across transformers
- Update DataFrameMethodTransformer tests to have inheritable init class that can be used by othe test files.
- Moved BaseTransformer, DataFrameMethodTransformer, BaseMappingTransformer, BaseMappingTransformerMixin, CrossColumnMappingTransformer and Mapping Transformer over to the new testing framework.
- Refactored MappingTransformer by removing redundant init method.
Expand All @@ -37,24 +39,26 @@ Changed
- Refactored ArbitraryImputer and BaseImputer tests in new format.
- Refactored MedianImputer tests in new format.
- Replaced occurrences of pd.Dataframe.drop() with del statement to speed up tubular. Note that no additional unit testing has been done for copy=False as this release is scheduled to remove copy.
- Created BaseCrossColumnNumericTransformer class. Refactored CrossColumnAddTransformer and CrossColumnMultiplyTransformer to use this class.
Moved tests for these objects to new approach.
- Created BaseCrossColumnNumericTransformer class. Refactored CrossColumnAddTransformer and CrossColumnMultiplyTransformer to use this class. Moved tests for these objects to new approach.
- Created BaseCrossColumnMappingTransformer class and integrated into CrossColumnMappingTransformer tests
- Refactored BaseNominalTransformer tests in new format & moved its logic to the transform method.
- Refactored ModeImputer tests in new format.
- Added generic init tests to base tests for transformers that take two columns as an input.
- Refactored EqualityChecker tests in new format.
- Bugfix to MeanResponseTransformer to ignore unobserved categorical levels
- Refactored dates.py to prepare for testing refactor. Edited BaseDateTransformer (and created BaseDateTwoColumnTransformer) to follow standard format, implementing validations at init/fit/transform.
To reduce complexity of file, made transformers more opinionated to insist on specific and consistent column dtypes.
- Refactored dates.py to prepare for testing refactor. Edited BaseDateTransformer (and created BaseDateTwoColumnTransformer) to follow standard format, implementing validations at init/fit/transform. To reduce complexity of file, made transformers more opinionated to insist on specific and consistent column dtypes. `#246 <https://github.com/lvgig/tubular/pull/246>`_
- Added test_BaseTwoColumnTransformer base class for columns that require a list of two columns for input
- Added BaseDropOriginalMixin to mixin transformers to handle validation and method of dropping original features, also added appropriate test classes.
- Refactored MeanImputer tests in new format `#250 <https://github.com/lvgig/tubular/pull/250>`_
- Refactored DatetimeInfoExtractor to condense and improve readability


Removed
^^^^^^^
- Functionality for BaseTransformer (and thus all transformers) to take `None` as an option for columns. This behaviour was inconsistently implemented across transformers. Rather than extending to all we decided to remove
this functionality. This required updating a lot of test files.
- The `columns_set_or_check()` method from BaseTransformer. With the above change it was no longer necessary. Subsequent updates to nominal transformers and their tests were required.
- Set pd copy_on_write to True (will become default in pandas 3.0) which allowed the functionality of the copy method of the transformers to be dropped `#197 <https://github.com/lvgig/tubular/pull/197>`
- Set pd copy_on_write to True (will become default in pandas 3.0) which allowed the functionality of the copy method of the transformers to be dropped `#197 <https://github.com/lvgig/tubular/pull/197>`_

1.2.2 (2024-02-20)
------------------
Expand All @@ -71,14 +75,14 @@ Changed
------------------
Added
^^^^^
- Updated GroupRareLevelsTransformer so that when working with category dtypes it forgets categories encoded as rare (this is wanted behaviour as these categories are no longer present in the data) `<#177 https://github.com/lvgig/tubular/pull/177>`_
- Updated GroupRareLevelsTransformer so that when working with category dtypes it forgets categories encoded as rare (this is wanted behaviour as these categories are no longer present in the data) `#177 <https://github.com/lvgig/tubular/pull/177>`_

1.2.0 (2024-02-06)
------------------
Added
^^^^^
- Update OneHotEncodingTransformer to default to returning int8 columns `#175 <https://github.com/lvgig/tubular/pull/175>`_
- Updated NullIndicator to return int8 columns `<#173 https://github.com/lvgig/tubular/pull/173>`_
- Updated NullIndicator to return int8 columns `#173 <https://github.com/lvgig/tubular/pull/173>`_
- Updated MeanResponseTransformer to coerce return to float (useful behaviour for category type features) `#174 <https://github.com/lvgig/tubular/pull/174>`_

1.1.1 (2024-01-18)
Expand Down
40 changes: 40 additions & 0 deletions tests/base/test_BaseTwoColumnTransformer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from tests.base_tests import (
GenericFitTests,
GenericTransformTests,
OtherBaseBehaviourTests,
TwoColumnListInitTests,
)


class TestInit(TwoColumnListInitTests):
"""Generic tests for transformer.init()."""

@classmethod
def setup_class(cls):
cls.transformer_name = "BaseTwoColumnTransformer"


class TestFit(GenericFitTests):
"""Generic tests for transformer.fit()"""

@classmethod
def setup_class(cls):
cls.transformer_name = "BaseTwoColumnTransformer"


class TestTransform(GenericTransformTests):
@classmethod
def setup_class(cls):
cls.transformer_name = "BaseTwoColumnTransformer"


class TestOtherBaseBehaviour(OtherBaseBehaviourTests):
"""
Class to run tests for BaseTransformerBehaviour outside the three standard methods.
May need to overwite specific tests in this class if the tested transformer modifies this behaviour.
"""

@classmethod
def setup_class(cls):
cls.transformer_name = "BaseTwoColumnTransformer"
61 changes: 4 additions & 57 deletions tests/base/test_DataFrameMethodTransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import tests.test_data as d
from tests.base_tests import (
ColumnStrListInitTests,
DropOriginalInitTests,
DropOriginalTransformTests,
GenericFitTests,
GenericTransformTests,
OtherBaseBehaviourTests,
Expand Down Expand Up @@ -83,25 +85,8 @@ def test_exception_raised_non_pandas_method_passed(self):
columns=["b", "c"],
)

@pytest.mark.parametrize("not_bool", [{"a": 1}, [1, 2], 1, "True", 1.5])
def test_exception_raised_drop_original_not_bool(self, not_bool):
"""Test an exception is raised if drop_original is not a string"""

with pytest.raises(
TypeError,
match=re.escape(
f"DataFrameMethodTransformer: unexpected type ({type(not_bool)}) for drop_original, expecting bool",
),
):
DataFrameMethodTransformer(
new_column_names="a",
pd_method_name="sum",
columns=["b", "c"],
drop_original=not_bool,
)


class TestInit(DataFrameMethodTransformerInitTests):
class TestInit(DropOriginalInitTests, DataFrameMethodTransformerInitTests):
@classmethod
def setup_class(cls):
cls.transformer_name = "DataFrameMethodTransformer"
Expand All @@ -115,7 +100,7 @@ def setup_class(cls):
cls.transformer_name = "DataFrameMethodTransformer"


class TestTransform(GenericTransformTests):
class TestTransform(DropOriginalTransformTests, GenericTransformTests):
"""Tests for DataFrameMethodTransformer.transform()."""

@classmethod
Expand Down Expand Up @@ -187,44 +172,6 @@ def test_expected_output_multi_columns_assignment(self, df, expected):
msg="DataFrameMethodTransformer divide by 2 columns b and c",
)

def test_original_columns_dropped_when_specified(self):
"""Test DataFrameMethodTransformer.transform drops original columns get when specified."""
df = d.create_df_3()

x = DataFrameMethodTransformer(
new_column_names="a_b_sum",
pd_method_name="sum",
columns=["a", "b"],
drop_original=True,
)

x.fit(df)

df_transformed = x.transform(df)

assert ("a" not in df_transformed.columns.to_numpy()) and (
"b" not in df_transformed.columns.to_numpy()
), "original columns not dropped"

def test_original_columns_kept_when_specified(self):
"""Test DataFrameMethodTransformer.transform keeps original columns when specified."""
df = d.create_df_3()

x = DataFrameMethodTransformer(
new_column_names="a_b_sum",
pd_method_name="sum",
columns=["a", "b"],
drop_original=False,
)

x.fit(df)

df_transformed = x.transform(df)

assert ("a" in df_transformed.columns.to_numpy()) and (
"b" in df_transformed.columns.to_numpy()
), "original columns not kept"


class TestOtherBaseBehaviour(OtherBaseBehaviourTests):
"""
Expand Down
Loading

0 comments on commit 62f49eb

Please sign in to comment.