FIX-#7486: Add support for `.astype(pandas.CategoricalDtype(…))` by camriddell · Pull Request #7487 · modin-project/modin

camriddell · 2025-03-31T19:46:57Z

What do these changes do?

first commit message and PR title follow format outlined here

NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.
passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
signed commit with git commit -s
Resolves BUG: .astype(mpd.CategoricalDtype(…)) fails to convert via .astype #7486
tests added and passing: extended 2 existing tests for converting Categorical/Category data types
module layout described at docs/development/architecture.rst is up-to-date

…pe(…)) Signed-off-by: Cameron Riddell <cameron@dutc.io>

sfc-gh-jkew · 2025-03-31T19:59:29Z

modin/core/dataframe/pandas/dataframe/dataframe.py

        else:
            # Assume that the dtype is a scalar.
-            if not (col_dtypes == self_dtypes).all():
+            if not (self_dtypes == col_dtypes).all():


This points to an underlying issue with how binary_op "eq" is handled for dtypes.

Is it maybe dtypes.py 326?

Hmm, I can't tell in where in this code-path DtypesDescriptor.equals would be invoked as self.dtypes returns a pandas.Series and col_dtypes is a user-passed object. Definitely open to further discussion as I'm not terribly familiar with the code-base so there likely are mechanics that I am not aware of after a superficial skim.

(for now) I am still of the opinion that this is a order-induced precedence issue, where we want to use the logic defined in the modin object to see if it is equivalent to a user-passed in object.

For the case of col_dtypes == self_dtypes (the current implementation) we always delegate to col_dtypes.__eq__ to check if the two objects are equivalent. Only if this function returns NotImplemented do we invoke the self_dtypes.__eq__ method.

In the common case, col_dtypes is a built-in object (most commonly a string at this point in the code), then we can rely on the fact that strict type checking is done as part of comparison and if the types don't align the a NotImplemented is returned, thus Python then check the __eq__ method of the other object.

In the current case modin.pandas.CategoricalDtype defines an __eq__ that never returns NotImplemented (it returns False in our case) therefore the broadcasted comparison logic is never executed (as that logic exists inside of the modin.Series. In this case pandas.CategoricalDtype defines an __eq__ implementation that returns False (which is logically correct). However since we're comparing against a Series we want a Series[bool] back instead of just a single bool. This is why .all() fails as shown in the traceback.

By swapping this comparison around, we are forcing the Series.__eq__ method to be invoked which returns an object of the correct shape/results.

A brief snippet to describe my observations

import modin.pandas as mpd class T: def __eq__(self, other): return False s = mpd.Series([1, 2, 3]) print( f'{s == 1}', # Series → [True, False, False] f'{(1).__eq__(s) = }', # NotImpelemented f'{1 == s}', # Series → [True, False, False] f'{T() == s}', # False (will always be False) f'{s == T()}', # Series → [False, False, False] sep='\n{}\n'.format('-'*40) )

I don't fully understand the root cause here, but this seems like a simple fix.

sfc-gh-jkew

Makes sense to me. @devin-petersohn might want to take a look.

modin/tests/pandas/dataframe/test_map_metadata.py

sfc-gh-mvashishtha · 2025-04-02T20:05:59Z

modin/core/dataframe/pandas/dataframe/dataframe.py

        else:
            # Assume that the dtype is a scalar.
-            if not (col_dtypes == self_dtypes).all():
+            if not (self_dtypes == col_dtypes).all():


I don't fully understand the root cause here, but this seems like a simple fix.

Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>

This will be fixed in a future modin release modin-project/modin#7487

… to Enum (only in main namespace, not stable.v1) (#2192) * enh nw.Enum to accept categories - add conversion from native to pandas - add conversion from native to Polars * add tests for nw.Enum(categories) * fix enum type checking for Enum dtype * fix enum doctest * positive check for enum instance for pyright * enum use implementation specific CategoricalDtype * enum preserve v1 behavior * preserve v1 polars enum conversion * add Enum support to dask * modin to xfail on Enum dtype This will be fixed in a future modin release modin-project/modin#7487 * Enum support outside of V1 * Fix v1 enum missing argument teset * fix enum error match for py38 * add pragma: no cover to v1.Enum from aligned with DType class * parametrize api versions for dtypes tests * decouple narwhals versioned dtypes * enum types to pass tests * fix: pyright warning on incorrect line for Enum Let this be a warning, check your files after you use a linter. * test(typing): Fix redefinition * test: Add `test_enum_v1_is_enum_unstable` Last statement is failing * fix: Get more of `__eq__` working * fix: Correct subtyping for `__eq__` #2192 (comment) * chore: Coverage https://github.com/narwhals-dev/narwhals/actions/runs/14361432835/job/40263682474?pr=2192 * list -> tuple * use str in annotation but at runtime dont raise for hashable (pandas..) * improve error message * move another test to v1test * update backcompat doc * coverage * test: fix doctest https://github.com/narwhals-dev/narwhals/actions/runs/14427770420/job/40458883250?pr=2192, #2192 (comment) * test: Add `test_enum_from_series` Resolves comment (#2192 (comment)) * enum add non-string and duplicate checking * Update docs/backcompat.md Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com> * fix(typing): Resolve some `Enum.categories` issues * refactor: Simplify `__init__`, raise earlier Also adapted an error message to be closer to https://github.com/pola-rs/polars/blob/319a9a84ab573886b2a13548a8e462fee353acef/py-polars/polars/datatypes/classes.py#L694-L696 * remove enum duplication/null checks --------- Co-authored-by: dangotbanned <125183946+dangotbanned@users.noreply.github.com> Co-authored-by: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com>

FIX-modin-project#7486: Add support for .astype(pandas.CategoricalDty…

56e98c6

…pe(…)) Signed-off-by: Cameron Riddell <cameron@dutc.io>

camriddell requested review from a team, RehanSD, YarShev, anmyachev, dchigarev, devin-petersohn, mvashishtha and vnlitvinov as code owners March 31, 2025 19:46

sfc-gh-jkew reviewed Mar 31, 2025

View reviewed changes

sfc-gh-jkew approved these changes Apr 1, 2025

View reviewed changes

sfc-gh-mvashishtha reviewed Apr 2, 2025

View reviewed changes

categories should be present in the test data

03aec1f

Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>

sfc-gh-jkew merged commit d512c53 into modin-project:main Apr 3, 2025
14 checks passed

camriddell added a commit to camriddell/narwhals that referenced this pull request Apr 4, 2025

modin to xfail on Enum dtype

3e1db9e

This will be fixed in a future modin release modin-project/modin#7487

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX-#7486: Add support for `.astype(pandas.CategoricalDtype(…))`#7487

FIX-#7486: Add support for `.astype(pandas.CategoricalDtype(…))`#7487
sfc-gh-jkew merged 2 commits intomodin-project:mainfrom
camriddell:fix/support-categoricaldtype

camriddell commented Mar 31, 2025

Uh oh!

sfc-gh-jkew Mar 31, 2025

Uh oh!

camriddell Mar 31, 2025

Uh oh!

sfc-gh-mvashishtha Apr 2, 2025

Uh oh!

sfc-gh-jkew left a comment

Uh oh!

Uh oh!

Uh oh!

sfc-gh-mvashishtha Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

camriddell commented Mar 31, 2025

What do these changes do?

Uh oh!

sfc-gh-jkew Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

camriddell Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-mvashishtha Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-jkew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sfc-gh-mvashishtha Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants