-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enh]: Add Series|Expr.replace
#1223
Comments
thanks @FBruzzesi ! I think the Polars-native solution would be: df.sort(pl.col(names).replace({x: i for i, x in enumerate(order)})) , without setting any temporary columns that would require both |
Thanks for the feedback! Yes I would imagine that polars can get around that using expressions as sort key - yet we currently don't support expressions in these contexts, and I have the impression that it may not be trivial to allow that in the current framework we have? |
yup, definitely not trivial...but I think you're right, |
what would you think about doing this using a join? Polars does a join under the hood to do this anyway Example: @nw.narwhalify(eager_only=True)
def sort_by_custom_order(df, key, order):
order_key = generate_unique_token(8, df.columns)
order_df = nw.from_dict(
{key: order, order_key: range(len(order))},
native_namespace=nw.get_native_namespace(df),
)
return df.join(order_df, on=key, how="left").sort(order_key).drop(order_key) which, in the Plotly context, you could call as args["data_frame"] = sort_by_custom_order(df, names, order) Demo: import polars as pl
import pandas as pd
import narwhals.stable.v1 as nw
from narwhals.utils import generate_unique_token
import pyarrow as pa
data = {'a': ['foo', 'bar', 'foo', 'foo', 'bar', 'quox', 'foo'], 'b': [1, 3,2,6,3,3,4]}
order = ['foo', 'quox', 'bar']
@nw.narwhalify(eager_only=True)
def sort_by_custom_order(df, key, order):
order_key = generate_unique_token(8, df.columns)
order_df = nw.from_dict({key: order, order_key: range(len(order))}, native_namespace=nw.get_native_namespace(df))
return df.join(order_df, on=key, how='left').sort(order_key).drop(order_key)
print(sort_by_custom_order(pd.DataFrame(data), 'a', order))
print(sort_by_custom_order(pl.DataFrame(data), 'a', order))
print(sort_by_custom_order(pa.table(data), 'a', order)) outputs
|
🤔 nevermind, the join strategy seems to be slower than the concat strategy from the plotly pr 😳 |
I make a branch in which I roughly implemented import polars as pl
import pandas as pd
import narwhals.stable.v1 as nw
from narwhals.utils import generate_unique_token
import pyarrow as pa
import numpy as np
rng = np.random.default_rng(1)
pd.set_option('future.no_silent_downcasting', True)
data = {'a': ['foo', 'bar', 'foo', 'foo', 'bar', 'quox', 'foo'], 'b': [1, 3,2,6,3,3,4]}
order = ['foo', 'quox', 'bar']
@nw.narwhalify(eager_only=True)
def func(df, key, order):
order_key = generate_unique_token(8, df.columns)
order_df = nw.from_dict({key: order, order_key: range(len(order))}, native_namespace=nw.get_native_namespace(df))
return df.join(order_df, on=key, how='left').sort(order_key).drop(order_key)
@nw.narwhalify
def func2(df, key, order):
return nw.concat(
[df.filter(nw.col(key) == value) for value in order], how="vertical"
)
@nw.narwhalify
def func3(df, key, order):
token = generate_unique_token(8, df.columns)
return df.with_columns(nw.col(key).replace_strict({x: i for i, x in enumerate(order)}, return_dtype=nw.UInt8).alias(token)).sort(token).drop(token)
print(func(pd.DataFrame(data), 'a', order))
print(func(pl.DataFrame(data), 'a', order))
print(func2(pd.DataFrame(data), 'a', order))
print(func2(pl.DataFrame(data), 'a', order))
print(func3(pd.DataFrame(data), 'a', order))
print(func3(pl.DataFrame(data), 'a', order))
bigdata = {'a': rng.integers(0, 3, size=100_000), 'b': rng.integers(0, 3, size=100_000), 'c': rng.integers(0, 3, size=100_000)}
order = [1, 0, 2]
However, Polars doesn't rechunk when it interestingly enough, any of these approaches is faster than the original index-based solution in plotly:
This is pleasantly surprising to me, I was expecting that we would be degrading performance here - nice! |
Thanks Marco, that's definitly unexpected. |
|
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
This would enable plotly to do custom sorting without filtering + concatenating:
(code snippet)
Please describe the purpose of the new feature or describe the problem to solve.
Replicate polars
Expr|Series.replace
Suggest a solution if possible.
No response
If you have tried alternatives, please describe them below.
No response
Additional information that may help us understand your needs.
No response
The text was updated successfully, but these errors were encountered: