Conversation
FBruzzesi
left a comment
There was a problem hiding this comment.
Thanks for the contribution @thomasjpfan 🚀
I left a few comments - those in the _arrow module apply to all other modules. In general we might try lower some repetition: evaluate_output_names is always the same function for all backends and the selection is quite close to ComlpliantExpr.from_column_names. Maybe we could generalize that a bit to be re-usable
| def func(df: ArrowDataFrame) -> list[ArrowSeries]: | ||
| return [ | ||
| ArrowSeries( | ||
| df._native_frame[column_name], | ||
| name=column_name, | ||
| backend_version=df._backend_version, | ||
| version=df._version, | ||
| ) | ||
| for column_name in evaluate_output_names(df) | ||
| ] |
There was a problem hiding this comment.
I wonder if we could tweak ArrowExpr.from_column_names and use it here 🤔 It might be a refactor follow up
There was a problem hiding this comment.
The refactor would be to accept able callable that returns the names.
@classmethod
def from_column_names(
cls,
get_column_names, # callable
function_name,
backend_version,
version,
):
def func(df):
return [... for column_name in get_column_names(df)]
return cls(
func,
function_name=function_name,
evaluate_output_names=get_column_names
)Then exclude and col gets refactored to:
def col(self: Self, *column_names: str):
return ArrowExpr.from_column_names(
lambda _: column_names, function_name="col", ...
)
def exclude(self: Self, *column_names: str):
def get_column_names(df) -> Sequence[str]:
exclude_names = set(column_names)
return [
column_name
for column_name in df.columns
if column_name not in exclude_names
]
return ArrowExpr.from_column_names(
get_column_names,
function_name="exclude", ...
)I did not want to increase the scope of this PR by changing the signature of from_column_names and changing col. I'm okay with doing a quick follow up.
There was a problem hiding this comment.
Thanks @thomasjpfan
The refactor would be to accept able callable that returns the names.
Yes I think in the long term that's desirable and we should aim for that
I did not want to increase the scope of this PR by changing the signature of from_column_names and changing col. I'm okay with doing a quick follow up.
Happy to keep it as follow up
| def func(df: ArrowDataFrame) -> list[ArrowSeries]: | ||
| return [ | ||
| ArrowSeries( | ||
| df._native_frame[column_name], | ||
| name=column_name, | ||
| backend_version=df._backend_version, | ||
| version=df._version, | ||
| ) | ||
| for column_name in evaluate_output_names(df) | ||
| ] |
There was a problem hiding this comment.
Thanks @thomasjpfan
The refactor would be to accept able callable that returns the names.
Yes I think in the long term that's desirable and we should aim for that
I did not want to increase the scope of this PR by changing the signature of from_column_names and changing col. I'm okay with doing a quick follow up.
Happy to keep it as follow up
dangotbanned
left a comment
There was a problem hiding this comment.
Thanks everyone (especially @thomasjpfan) 🎉

What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
This PR adds
nw.excludeand unit tests.