Skip to content

chore: Add Compliant*.from_native#2315

Merged
MarcoGorelli merged 30 commits intomainfrom
compliant-from-native
Apr 4, 2025
Merged

chore: Add Compliant*.from_native#2315
MarcoGorelli merged 30 commits intomainfrom
compliant-from-native

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Mar 29, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Following roughly the same pattern as from_numpy

  • FromNative protocol
  • CompliantSeries
  • CompliantDataFrame
  • CompliantNamespace (instance method)
    • EagerNamespace
    • PolarsNamespace
    • LazyNamespace

@dangotbanned
Copy link
Member Author

dangotbanned commented Mar 29, 2025

Reference

Did some experimenting and arrived at this being the pattern I like the most:

Code block

FrameT = TypeVar("FrameT")
SeriesT = TypeVar("SeriesT")


class CompliantNamespace(Protocol[FrameT, SeriesT]):
    @property
    def _dataframe(self) -> type[FromNative[FrameT]]: ...
    @property
    def _series(self) -> type[FromNative[SeriesT]]: ...

    @overload
    def from_native(self, obj: FrameT) -> FromNative[FrameT]: ...

    @overload
    def from_native(self, obj: SeriesT) -> FromNative[SeriesT]: ...

    def from_native(
        self, obj: FrameT | SeriesT | Any
    ) -> FromNative[FrameT] | FromNative[SeriesT]:
        if self._dataframe._is_native(obj):
            return self._dataframe.from_native(obj)
        elif self._series._is_native(obj):
            return self._series.from_native(obj)
        msg = f"Unsupported type: {type(obj).__name__!r}"
        raise TypeError(msg)

Moving up a level, the narrowing looks like:

Screenshot

image

So rather than the current nw.from_native, which is roughly:

if is_polars_dataframe(native_object): ...
elif is_polars_lazyframe(native_object): ...
elif is_polars_series(native_object): ...
elif is_pandas_dataframe(native_object): ...
elif is_pandas_series(native_object): ...
elif is_modin_dataframe(native_object): ...
elif is_modin_series(native_object): ...
elif is_cudf_dataframe(native_object): ...
elif is_cudf_series(native_object): ...
# ... more

We'd aim to narrow to a Namespace first, then let that dispatch to the right from_native call 😎

if is_native_polars(native_object): ...
elif is_native_pandas_like(native_object): ...

Note

I had to partially implement that in translate.py to get PolarsNamespace.from_native coverage (9396bbc)
(#2324) takes that idea and runs with it

@dangotbanned dangotbanned mentioned this pull request Mar 28, 2025
6 tasks
@dangotbanned dangotbanned changed the title chore(DRAFT): Add Compliant*.from_native chore: Add Compliant*.from_native Mar 30, 2025
Comment on lines +61 to +62
_StoresNative[NativeFrameT],
FromNative[NativeFrameT],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking about combining these as one protocol.

_StoreNative might still be useful on its own, but something like this could be nice:

from typing import Protocol, TypeVar

from narwhals._translate import FromNative
from narwhals.utils import _StoresNative

NativeT = TypeVar("NativeT")

class Compliant(_StoresNative[NativeT], FromNative[NativeT], Protocol[NativeT]): ...

Not super important though 😅

@dangotbanned
Copy link
Member Author

Throwaway Idea

We could get some more (from_|to_) methods from what is currently nested in (lazy|collect).

Collected some notes while trying to get coverage for CompliantLazyFrame

class DaskLazyFrame(CompliantLazyFrame["DaskExpr", "dd.DataFrame"]):...
# NOTE: Supports:
# Eager.lazy()
# - from_arrow (via Table.to_pandas, dd.from_pandas)
# - from_pandas
# DaskLazyFrame.collect()
# - to_arrow (via self.native.compute() -> pl.from_pandas())
# - to_pandas (via self.native.compute())
# - to_polars (via self.native.compute() ->  pa.Table.from_pandas())
class DuckDBLazyFrame(CompliantLazyFrame["DuckDBExpr", "duckdb.DuckDBPyRelation"]):...
# NOTE: Supports:
# Eager.lazy()
# - from_arrow
# - from_pandas
# DuckDBLazyFrame.collect()
# - to_arrow (via self.native.arrow())
# - to_pandas (via self.native.df())
# - to_polars (via self.native.pl())
class PolarsLazyFrame:
# NOTE: Supports:
# Eager.lazy()  # noqa: ERA001
# - from_arrow (via pl.from_arrow().lazy())
# - from_pandas (via pl.from_pandas().lazy())
class SparkLikeLazyFrame(CompliantLazyFrame["SparkLikeExpr", "SQLFrameDataFrame"]):
# NOTE: Supports:
# SparkLikeLazyFrame.collect()
# - to_arrow (via self._collect_to_arrow())
# - to_pandas (via self.native.toPandas())
# - to_polars (via self._collect_to_arrow() -> pl.from_arrow())

@dangotbanned dangotbanned marked this pull request as ready for review March 30, 2025 19:27
by_sql = f"{_input} asc nulls last"
sql = f"{func_name}() OVER (order by {by_sql})"
return when(_input.isnotnull(), SQLExpression(sql))
return when(_input.isnotnull(), SQLExpression(sql)) # type: ignore[no-any-return, unused-ignore]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the website says May

would it work for you to install the pre-release (pip install -U --pre duckdb)? that's what runs in CI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli is there not a way to specify that in our pyproject.toml?

I'm usually pretty cautious about doing one-off installs, since I inevitably forget what I did the next time around 😄

dangotbanned added a commit that referenced this pull request Apr 1, 2025
Will support the pattern from (#2315)
@dangotbanned dangotbanned added high priority Your PR will be reviewed very quickly if you address this and removed typing labels Apr 3, 2025
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dangotbanned

@MarcoGorelli MarcoGorelli merged commit 78a5511 into main Apr 4, 2025
28 checks passed
@MarcoGorelli MarcoGorelli deleted the compliant-from-native branch April 4, 2025 09:39
@dangotbanned
Copy link
Member Author

thanks @dangotbanned

Thanks for all the reviews @MarcoGorelli 😍

@dangotbanned dangotbanned linked an issue Apr 4, 2025 that may be closed by this pull request
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api design high priority Your PR will be reviewed very quickly if you address this internal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enh: Add Compliant.from_* constructors

2 participants