Skip to content

refactor: adding Compliant* sub-protocols#2055

Closed
dangotbanned wants to merge 13 commits intomainfrom
reuse-sub-protocols
Closed

refactor: adding Compliant* sub-protocols#2055
dangotbanned wants to merge 13 commits intomainfrom
reuse-sub-protocols

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Feb 20, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Note

Thinking about renaming Reuse* -> EagerOnly* or something like that

TODO

  • evaluate_into_exprs -> ReuseDataFrame._evaluate_into_exprs
    • mypy cannot cope with the generics
    • pyright understands it
    • The doc is already hinting that this should be a method
    • def evaluate_into_expr(
      df: CompliantFrameT_contra,
      expr: CompliantExpr[CompliantFrameT_contra, CompliantSeriesT_co],
      ) -> Sequence[CompliantSeriesT_co]:
      """Return list of raw columns.
      This is only use for eager backends (pandas, PyArrow), where we
      alias operations at each step. As a safety precaution, here we
      can check that the expected result names match those we were
      expecting from the various `evaluate_output_names` / `alias_output_names`
      calls. Note that for PySpark / DuckDB, we are less free to liberally
      set aliases whenever we want.
      """
      _, aliases = evaluate_output_names_and_aliases(expr, df, [])
      result = expr(df)
      if list(aliases) != [s.name for s in result]: # pragma: no cover
      msg = f"Safety assertion failed, expected {aliases}, got {result}"
      raise AssertionError(msg)
      return result
      def evaluate_into_exprs(
      df: CompliantFrameT_contra,
      /,
      *exprs: CompliantExpr[CompliantFrameT_contra, CompliantSeriesT_co],
      ) -> list[CompliantSeriesT_co]:
      """Evaluate each expr into Series."""
      return [
      item
      for sublist in (evaluate_into_expr(df, into_expr) for into_expr in exprs)
      for item in sublist
      ]

Comment on lines 221 to 262
def _create_compliant_series(self, value: Any) -> ReuseSeriesT: ...

# NOTE: Fully spec'd
def _create_expr_from_callable(
self,
func: Callable[[CompliantDataFrameT_co], Sequence[ReuseSeriesT]],
*,
depth: int,
function_name: str,
evaluate_output_names: Callable[[CompliantDataFrameT_co], Sequence[str]],
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None,
kwargs: dict[str, Any],
) -> ReuseExpr[ReuseSeriesT]:
return self.__narwhals_expr__()(
func,
depth=depth,
function_name=function_name,
evaluate_output_names=evaluate_output_names,
alias_output_names=alias_output_names,
implementation=self._implementation,
backend_version=self._backend_version,
version=self._version,
kwargs=kwargs,
)

# NOTE: Fully spec'd
def _create_expr_from_series(self, series: ReuseSeriesT) -> ReuseExpr[ReuseSeriesT]:
return self.__narwhals_expr__()(
lambda _df: [series],
depth=0,
function_name="series",
evaluate_output_names=lambda _df: [series.name],
alias_output_names=None,
implementation=self._implementation,
backend_version=self._backend_version,
version=self._version,
kwargs={},
)

def _create_series_from_scalar(
self, value: Any, *, reference_series: ReuseSeriesT
) -> ReuseSeriesT: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these all be replaced like this?

  • ReuseNamespace._create_compliant_series() -> ReuseSeries()
  • ReuseNamespace._create_expr_from_callable() -> ReuseExpr._from_callable()
  • ReuseNamespace._create_expr_from_series() -> ReuseExpr._from_series()
  • ReuseNamespace._create_series_from_scalar() -> ReuseSeries._from_scalar()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +386 to +398
def ensure_pandas_like(self: Self) -> None:
if self.is_pandas_like():
return
pandas_like = {Implementation.PANDAS, Implementation.CUDF, Implementation.MODIN}
msg = f"Expected pandas-like implementation ({pandas_like}), found {self}"
raise TypeError(msg)

# NOTE: Not sure why this differs from `pandas_like`
def ensure_pyarrow(self: Self) -> None:
if self.is_pyarrow():
return
msg = f"Expected pyarrow, got: {type(self)}" # pragma: no cover
raise AssertionError(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably going to split this into another PR.

Would be helpful for representing this

# NOTE: Each side adds an `AssertionError` guard first
# Would probably make more sense to require a ClassVar, either:
# - defining the set of permitted impls
# - setting an unbound method, which also gives the error message
# - E.g. `PandasLikeSeries._ensure = Implementation.ensure_pandas_like`
# - That gets called w/ `ReuseSeries._ensure(ReuseSeries._implementation)`
def __native_namespace__(self: Self) -> ModuleType:
return self._implementation.to_native_namespace()

def __native_namespace__(self: Self) -> ModuleType:
if self._implementation in {
Implementation.PANDAS,
Implementation.MODIN,
Implementation.CUDF,
}:
return self._implementation.to_native_namespace()
msg = f"Expected pandas/modin/cudf, got: {type(self._implementation)}" # pragma: no cover
raise AssertionError(msg)

But there are lots of other places this kind of check/error message is copy/pasted at the moment

version: Version,
) -> None: ...

def __narwhals_expr__(self) -> Callable[..., ReuseExpr[ReuseSeriesT]]: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more permissive at this level; I've used:

  • Callable[..., ReuseExpr[ReuseSeriesT]] instead of
  • type[ReuseExpr[ReuseSeriesT]]

@dangotbanned
Copy link
Member Author

@MarcoGorelli no rush on this, just wanted to check in to see if your happy with the direction of this?

Actually implementing it shouldn't be difficult from this point, but would be a huge diff

dangotbanned added a commit that referenced this pull request Feb 23, 2025
- We need more complete generic typing to have this enabled
- We're getting there with (#2064, #2055) - but not there yet
@dangotbanned
Copy link
Member Author

Superseded by #2149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant