fix: Preserve pandas column name attribute#2363
Conversation
narwhals/_dask/dataframe.py
Outdated
|
|
||
| return PandasLikeDataFrame( | ||
| result, | ||
| result.rename_axis(columns=self.native.columns.name), |
narwhals/translate.py
Outdated
| backend_version=parse_version(pd), | ||
| implementation=Implementation.PANDAS, | ||
| backend_version=parse_version(pd), |
There was a problem hiding this comment.
Mental sanity for symmetry with other pandas-like and order of the spec 😅
|
|
||
| df = nw.from_native(df_native) | ||
|
|
||
| result = df.with_columns(b=nw.col("a") + 1, c=nw.col("a") * 2).select("c", "b") |
There was a problem hiding this comment.
I wanted to concatenate two methods here, mostly for the sake of it.
Might be worth reparametrizing it
There was a problem hiding this comment.
Dask select and with_columns have no issues since we use .assign method for both!
|
thanks! i'll do a little test to check there's no accidental / surprise overhead |
|
@MarcoGorelli I tried a simple script to check import timeit
from pprint import pprint
import numpy as np
import pandas as pd
def benchmark_rename_axis_overhead(n_rows: int, n_cols: int, repetitions: int = 1000) -> float:
df = pd.DataFrame(np.random.randn(n_rows, n_cols))
return timeit.timeit(
lambda: df.rename_axis(columns="new_name", copy=False),
number=repetitions
) / repetitions
TOTAL_SIZE = 10_000_000
results = {
(int(TOTAL_SIZE/10**i), 10**i): benchmark_rename_axis_overhead(n_rows=int(TOTAL_SIZE/10**i), n_cols=10**i)
for i in range(1, 6)
}
pprint(results)Let me know what your thoughts are and if we need to check something more extensively |
Single argument + fake typing 🎉
|
|
|
thanks!
i'm still slightly concerned about there being extra copies in old versions, i'll take another look |
No rush! I keep merging main to avoid having to deal with a large conflicting diff if that happens to be the case |
narwhals/_pandas_like/dataframe.py
Outdated
| return self.__class__( | ||
| df, | ||
| rename_axis( | ||
| df, | ||
| implementation=self._implementation, | ||
| backend_version=self._backend_version, | ||
| columns=self._native_columns_name, | ||
| ), |
There was a problem hiding this comment.
🤔 i'm wondering if we could/should just set df.columns.name = self._native_columns_name
else we need to trust pandas' copy-on-write / copy=False working properly, which i'm not sure i do
There was a problem hiding this comment.
I trust your judgement here - but should I would be concerned as a pandas user?
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks @FBruzzesi !
I don't trust pandas' rename_axis, nor do I trust that copy-on-write will take effect properly, we would need more comprehensive benchmarks to be sure
but, in select and with_columns, we create new objects, so it should be safe to just use .columns.name = ... there

What type of PR is this? (check all applicable)
Related issues
selectandwith_columnsdon't always preserve column index name for pandas-like #1483Checklist