Skip to content

chore: Spec CompliantLazyFrame#2232

Merged
MarcoGorelli merged 27 commits intomainfrom
compliant-lazyframe-spec
Mar 19, 2025
Merged

chore: Spec CompliantLazyFrame#2232
MarcoGorelli merged 27 commits intomainfrom
compliant-lazyframe-spec

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Mar 17, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

  • Reduce Incomplete markers (df00903)
  • Get feedback on what to do with deprecations

@dangotbanned
Copy link
Member Author

@dangotbanned dangotbanned changed the title chore(DRAFT): Spec CompliantLazyFrame chore: Spec CompliantLazyFrame Mar 17, 2025
@dangotbanned dangotbanned marked this pull request as ready for review March 17, 2025 16:15
@dangotbanned dangotbanned requested review from FBruzzesi and MarcoGorelli and removed request for MarcoGorelli March 17, 2025 16:15
@dangotbanned
Copy link
Member Author

dangotbanned commented Mar 17, 2025

@MarcoGorelli @FBruzzesi

I really feel like I need some guidance on what the intended API is for CompliantLazyFrame.
Working through the PR has left me fairly unsure - particularly the deprecated bits 🤔

From the perspective of a type checker - anything that is called on this attribute must be included:

self._compliant_frame: Any = df.__narwhals_lazyframe__()

@FBruzzesi
Copy link
Member

I really feel like I need some guidance on what the intended API is for CompliantLazyFrame. Working through the PR has left me fairly unsure - particularly the deprecated bits 🤔

@dangotbanned I fell so behind with the progress you made in the last 2-3 weeks, so I am not sure I can help 🙈

Regarding deprecation, that would not affect v1 - would it work to have CompliantLazyFrame be the spec for main, and (something like) V1CompliantLazyFrame inherit from it and add those deprecated methods?

@dangotbanned
Copy link
Member Author

@dangotbanned I fell so behind with the progress you made in the last 2-3 weeks, so I am not sure I can help 🙈

aha no worries @FBruzzesi - it might look like a lot, but really its just formally defining what was already there 🙂

Regarding deprecation, that would not affect v1 - would it work to have CompliantLazyFrame be the spec for main, and (something like) V1CompliantLazyFrame inherit from it and add those deprecated methods?

That could work.
One thing to keep in mind is we'd need a V1 and MAIN (non-prefixed) version of all the classes (I'm talking _arrow, _dask, _...).

Initially that might seem insane - but a big positive is we wouldn't need to pass around the Version for every new object.
An idea I had a while back (#1657 (comment)) may come in handy for that - if we went down that route.

Definitely is something we need to have a clear plan for before moving to V2

@MarcoGorelli
Copy link
Member

Do we need to include deprecated or v1-only methods at all? Can the compliant protocols only cover the main namespace?

@dangotbanned
Copy link
Member Author

dangotbanned commented Mar 18, 2025

Do we need to include deprecated or v1-only methods at all? Can the compliant protocols only cover the main namespace?

Good questions @MarcoGorelli!

We don't need to include anything 😉
More seriously though, I'm hoping I can tease out what role(s) we want the protocols to serve?

Example

only cover the main namespace

If we take that approach and then move to v2, all Compliant* are required to implement v2 but not v1.
That seems like it could be a footgun for 2 reasons:

  1. We might accidentally remove things from a class and break compatibility with v1.
    i. We lose the type safety that would've caught that statically
  2. v1 is not frozen when we move to v2
    i. All changes in v2 have the potential to impact a v1 implementation - unless the method depends on nothing outside of it

What does this PR describe?

Normal

We've got the main namespace covered with regular definitions like this.
The assumption would be every implementation must have that member or a not_implemented:

def drop(self, columns: Sequence[str], *, strict: bool) -> Self: ...

Deprecation (1)

Methods that were deprecated in v1 look like:

@deprecated("`LazyFrame.tail` is deprecated and will be removed in a future version.")
def tail(self, n: int) -> Self: ...

New information I've captured is deprecated functionality, that was not implemented.
My assumption is that this will never be implemented:

gather_every = not_implemented.deprecated(
"`LazyFrame.gather_every` is deprecated and will be removed in a future version."
)
tail = not_implemented.deprecated(
"`LazyFrame.tail` is deprecated and will be removed in a future version."
)

I think that's a useful distinction from functionality that is not implemented, but may be in the future:

with_row_index = not_implemented()

Deprecation (2)

There's a second kind of deprecation I'm not too sure on, like this:

@deprecated("only if version is v1, keep around for backcompat")
def to_arrow(self) -> pa.Table: ...
@deprecated("only if version is v1, keep around for backcompat")
def to_pandas(self) -> pd.DataFrame: ...

Where duckdb is the ugly duckling

def to_pandas(self: Self) -> pd.DataFrame:
# only if version is v1, keep around for backcompat
import pandas as pd # ignore-banned-import()
if parse_version(pd) >= (1, 0, 0):
return self._native_frame.df()
else: # pragma: no cover
msg = f"Conversion to pandas requires pandas>=1.0.0, found {pd.__version__}"
raise NotImplementedError(msg)
def to_arrow(self: Self) -> pa.Table:
# only if version is v1, keep around for backcompat
return self._native_frame.arrow()

ugly because it introduces

to_arrow = not_implemented.deprecated(
"only if version is v1, keep around for backcompat"
)
to_pandas = not_implemented.deprecated(
"only if version is v1, keep around for backcompat"
)

What else can we describe?

Outside of this PR, I've been using @unstable - which currently has no runtime or static behavior.

@unstable

narwhals/narwhals/utils.py

Lines 1507 to 1529 in 275c5b6

# TODO @dangotbanned: Extend with runtime behavior for `v1.*`
# See `narwhals.exceptions.NarwhalsUnstableWarning`
def unstable(fn: _Fn, /) -> _Fn:
"""Visual-only marker for unstable functionality.
Arguments:
fn: Function to decorate.
Returns:
Decorated function (unchanged).
Examples:
>>> from narwhals.utils import unstable
>>> @unstable
... def a_work_in_progress_feature(*args):
... return args
>>>
>>> a_work_in_progress_feature.__name__
'a_work_in_progress_feature'
>>> a_work_in_progress_feature(1, 2, 3)
(1, 2, 3)
"""
return fn

Usage

@unstable
def rolling_mean(
self,
window_size: int,
*,
min_samples: int,
center: bool,
) -> Self: ...

Summary

Apologies this got quite long!

Bringing it all back to:

what role(s) we want the protocols to serve?

  • We can encode as much or as little information as you like into the protocols.
  • The Perfect backwards compatibility policy seems an important concept to the project.
  • IMO, we have a better chance at avoiding future regressions by encoding deprecations into Compliant* protocols

Note

I'm very open to ideas on how to express these concepts differently (or not at all)

@MarcoGorelli
Copy link
Member

Where duckdb is the ugly duckling

same story with Ibis

Shall we start with just typing according to the main namespace?

@dangotbanned dangotbanned marked this pull request as ready for review March 18, 2025 19:36
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, looks good, just a couple of comments

`mypy` expands to an invalid type
> "CompliantLazyFrame[Any, NativeFrame]" of "Any | CompliantLazyFrame[Any, NativeFrame] | CompliantLazyFrame[Any, DataFrame[Any]] | CompliantLazyFrame[Any, LazyFrame[Any]] | CompliantLazyFrame[Any, DataFrameLike]"
subset: Sequence[str] | None,
*,
keep: Literal["any", "first", "last", "none"],
keep: Literal["any", "none"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was overkill

The real problem to solve was duckdb using keep: str

def unique(self: Self, subset: Sequence[str] | None, keep: str) -> Self:

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dangotbanned

@MarcoGorelli MarcoGorelli merged commit 4d5fb30 into main Mar 19, 2025
28 checks passed
@MarcoGorelli MarcoGorelli deleted the compliant-lazyframe-spec branch March 19, 2025 14:56
MarcoGorelli pushed a commit that referenced this pull request Mar 19, 2025
* chore: Add `CompliantDataFrame.native` -> `NativeFrame`

Related:
- #2232
- #2230
- #2130

* chore: Coverage for `PolarsDataFrame.native`

* refactor: remove already imported imports

* chore: Coverage for `ArrowDataFrame.native`

* chore: Cov for `PandasLikeDataFrame.native`

* refactor: Clean up `ArrowDataFrame`

* unbreak `ArrowDataFrame.with_columns`

and write an essay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants