Skip to content

chore: Remove unreachable hash check#2750

Merged
MarcoGorelli merged 2 commits intomainfrom
cov-check_column_names_are_unique
Jun 28, 2025
Merged

chore: Remove unreachable hash check#2750
MarcoGorelli merged 2 commits intomainfrom
cov-check_column_names_are_unique

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Jun 28, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Follow-up to #2749 (comment)

Could you give an example of how we'd get columns like that in the places the function is currently used?

import pandas as pd

>>> pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=([1], [2], [3]))
TypeError: unhashable type: 'list'

AFAIK, pandas only supports Hashable column "names" - so I'm a little confused 🤔

We're checking an existing NativeFrame.columns - so I would have thought that condition is not reachable

Follow-up to #2749 (comment)

If CI can catch this, then I'll add a comment explaining where this is needed
@dangotbanned dangotbanned changed the title chore: Do we really have coverage? chore: Remove unreachable hash check Jun 28, 2025
@MarcoGorelli
Copy link
Member

ah we ended up removing it in https://github.com/narwhals-dev/narwhals/pull/2011/files

@dangotbanned dangotbanned marked this pull request as ready for review June 28, 2025 13:53
@dangotbanned dangotbanned requested a review from FBruzzesi June 28, 2025 14:46
@MarcoGorelli
Copy link
Member

Doesn't hurt to keep it, as per the linked test case? Like this it gives a clearer error message for malformed pandas dfs?

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 28, 2025

Doesn't hurt to keep it, as per the linked test case? Like this it gives a clearer error message for malformed pandas dfs?

Sorry I'm not sure I understand.

You linked a test that was removed 4 months ago that has the comment:

def test_from_non_hashable_column_name() -> None:
   # This is technically super-illegal
    # BUT, it shows up in a scikit-learn test, so...

If that test is important - we should add it back?

Personally, I can't see why we should add error handling for a test fails in scikit-learn.
Maybe I could see this as important if we ran scikit-learn's test suite or if we had a test ourselves

As-is, we just have a check that's added overhead on every .with_native 🤔

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jun 28, 2025

I doubt the overhead is noticeable, a comment with a link to the pandas issue should be fine

@FBruzzesi
Copy link
Member

FBruzzesi commented Jun 28, 2025

@MarcoGorelli I am also having a hard time to figure out how this can be used in practice. I am able to create a dataframe, but any operation I am running is leading to an error, with the exception of accessing the columns attribute itself:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '1.1.3'

In [3]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["pizza", ["a", "b"]])

In [4]: df.columns
Out[4]: Index(['pizza', ['a', 'b']], dtype='object')

In [5]: df
Out[5]: ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File pandas/_libs/hashtable_class_helper.pxi:1709, in pandas._libs.hashtable.PyObjectHashTable.map_locations()

TypeError: unhashable type: 'list'
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas/_libs/hashtable_class_helper.pxi", line 1709, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'

if that's the case, len(set(...)) would raise the same error, and we don't need to necessarily/explicitly do it as well - at least for now. If eventually scikit-learn starts developing and find issues, we can change it or add a specific check for pandas 🤔

@dangotbanned
Copy link
Member Author

I doubt the overhead is noticeable, a comment with a link to the pandas issue should be fine

It isn't necessarily the overhead per-call that I'm worried about, the issue is that this check is on most/all method calls - potentially multiple times.
I really feel like a stronger case needs to be made to justify this

@MarcoGorelli
Copy link
Member

I am able to create a dataframe, but any operation I am running, is leading to an error

hmmm ok thanks, that is a sign that such a state is so broken that we shouldn't deal with it

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks both!

@MarcoGorelli MarcoGorelli merged commit 11343f1 into main Jun 28, 2025
43 of 45 checks passed
@MarcoGorelli MarcoGorelli deleted the cov-check_column_names_are_unique branch June 28, 2025 17:43
@dangotbanned
Copy link
Member Author

thanks both!

Thanks @MarcoGorelli! 😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants