Skip to content

feat: Adds nw.exclude#2122

Merged
dangotbanned merged 6 commits intonarwhals-dev:mainfrom
thomasjpfan:exclude
Mar 2, 2025
Merged

feat: Adds nw.exclude#2122
dangotbanned merged 6 commits intonarwhals-dev:mainfrom
thomasjpfan:exclude

Conversation

@thomasjpfan
Copy link
Contributor

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

This PR adds nw.exclude and unit tests.

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @thomasjpfan 🚀

I left a few comments - those in the _arrow module apply to all other modules. In general we might try lower some repetition: evaluate_output_names is always the same function for all backends and the selection is quite close to ComlpliantExpr.from_column_names. Maybe we could generalize that a bit to be re-usable

Comment on lines +135 to +144
def func(df: ArrowDataFrame) -> list[ArrowSeries]:
return [
ArrowSeries(
df._native_frame[column_name],
name=column_name,
backend_version=df._backend_version,
version=df._version,
)
for column_name in evaluate_output_names(df)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could tweak ArrowExpr.from_column_names and use it here 🤔 It might be a refactor follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor would be to accept able callable that returns the names.

@classmethod
def from_column_names(
    cls,
    get_column_names,  # callable
    function_name,
    backend_version,
    version,
):
    def func(df):
        return [... for column_name in get_column_names(df)]

    return cls(
        func,
        function_name=function_name,
        evaluate_output_names=get_column_names
    )

Then exclude and col gets refactored to:

def col(self: Self, *column_names: str):
    return ArrowExpr.from_column_names(
        lambda _: column_names, function_name="col", ...
    )

def exclude(self: Self, *column_names: str):
    def get_column_names(df) -> Sequence[str]:
        exclude_names = set(column_names)
        return [
            column_name
            for column_name in df.columns
            if column_name not in exclude_names
        ]

     return ArrowExpr.from_column_names(
         get_column_names,
         function_name="exclude", ...
     )

I did not want to increase the scope of this PR by changing the signature of from_column_names and changing col. I'm okay with doing a quick follow up.

Copy link
Member

@FBruzzesi FBruzzesi Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan

The refactor would be to accept able callable that returns the names.

Yes I think in the long term that's desirable and we should aim for that

I did not want to increase the scope of this PR by changing the signature of from_column_names and changing col. I'm okay with doing a quick follow up.

Happy to keep it as follow up

@FBruzzesi FBruzzesi added the enhancement New feature or request label Mar 1, 2025
@FBruzzesi FBruzzesi changed the title Adds nw.exclude feat: Adds nw.exclude Mar 1, 2025
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan 🚀

SwiftmoTwitchGIF

Comment on lines +135 to +144
def func(df: ArrowDataFrame) -> list[ArrowSeries]:
return [
ArrowSeries(
df._native_frame[column_name],
name=column_name,
backend_version=df._backend_version,
version=df._version,
)
for column_name in evaluate_output_names(df)
]
Copy link
Member

@FBruzzesi FBruzzesi Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan

The refactor would be to accept able callable that returns the names.

Yes I think in the long term that's desirable and we should aim for that

I did not want to increase the scope of this PR by changing the signature of from_column_names and changing col. I'm okay with doing a quick follow up.

Happy to keep it as follow up

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone (especially @thomasjpfan) 🎉

@dangotbanned dangotbanned merged commit 140833c into narwhals-dev:main Mar 2, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nw.exclude

4 participants