Skip to content

refactor: Organizing Compliant* APIs#3045

Merged
dangotbanned merged 19 commits intomainfrom
trim-compliant
Aug 30, 2025
Merged

refactor: Organizing Compliant* APIs#3045
dangotbanned merged 19 commits intomainfrom
trim-compliant

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Aug 27, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

  • Everything listed in (comment) has now been moved/already had been
  • Started on labelling (comment) what's narwhals vs polars
    • (a5f4432) is entirely to help make decisions easier

Note

All the threads are follow-up ideas

Comment on lines 95 to 106
@classmethod
def from_column_indices(
cls, *column_indices: int, context: _LimitedContext
) -> Self: ...
@classmethod
def from_column_names(
cls,
evaluate_column_names: EvalNames[CompliantFrameT],
/,
*,
context: _LimitedContext,
) -> Self: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat torn on these CompliantExpr constructors.

I like that all you need to do is implement these two:

  • from_column_indices
  • from_column_names
And then you get these 4 for "free"

class CompliantNamespace(Protocol[CompliantFrameT, CompliantExprT]):
# NOTE: `narwhals`
_implementation: Implementation
_version: Version
@property
def _expr(self) -> type[CompliantExprT]: ...
# NOTE: `polars`
def all(self) -> CompliantExprT:
return self._expr.from_column_names(get_column_names, context=self)
def col(self, *column_names: str) -> CompliantExprT:
return self._expr.from_column_names(
passthrough_column_names(column_names), context=self
)
def exclude(self, excluded_names: Container[str]) -> CompliantExprT:
return self._expr.from_column_names(
partial(exclude_column_names, names=excluded_names), context=self
)
def nth(self, *column_indices: int) -> CompliantExprT:
return self._expr.from_column_indices(*column_indices, context=self)

What I don't like is that _LimitedContext specifies you need to pass an object with:

  • _implementation: Implementation
  • _version: Version

Recently, this PR removed the requirement of also having _backend_version:

I'd like to continue that trend and work towards removing _implementation as well 😏

I've been wanting to drop down to 1x Implementation per class for a while:

Which would just mean defining it on the class like e.g. Arrow*

class ArrowExpr(EagerExpr["ArrowDataFrame", ArrowSeries]):
_implementation: Implementation = Implementation.PYARROW

If we do that, then:

  1. We don't need it included in any signatures
    i ArrowExpr still requires a sink in __init__ even now 😢

    implementation: Implementation | None = None,
    ) -> None:

  2. *Like classes don't need to branch on Implementation inside methods
    i. We just override things in subclasses

  3. We might be able to do something different for extensions (enh: Implementation for plugins #3042)
    i. Where Implementation.UNKNOWN is just a constant to satisfy typing atm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also then make __native_namespace__ a @classmethod 🤯

That doesn't depend on nw.Version, so we don't need to initialize a Compliant object to access it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also then make native_namespace a @classmethod 🤯

sure, if that works, why not

Comment on lines 110 to 112
def simple_select(self, *column_names: str) -> Self:
"""`select` where all args are column names."""
...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd prefer if this were named something like:

select_by_name
select_column_names
select_names
  1. simple requires explanation of what it means
  2. It is a subset of select, so a suffix means they show up together in autocomplete

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure happy to rename

Comment on lines 290 to 291
# `LazySelectorNamespace._iter_columns` depends
def _iter_columns(self) -> Iterator[Any]: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untangling this one will need to be it's own PR

@dangotbanned dangotbanned marked this pull request as ready for review August 29, 2025 20:16
CompliantColumn,
Protocol[NativeSeriesT],
):
# NOTE: `narwhals`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we split them by having all the narwhals ones be private, and and all the polars ones public?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we split them

I've thought about this as well in the past!

narwhals ones be private, and and all the polars ones public?

I'd prefer to define our public api using common conventions where possible 🙏

For example, enabling this pyright rule already isn't feasible for us

 reportPrivateImportUsage = "none"
 reportUnusedExpression = "none"    # handled by (https://docs.astral.sh/ruff/rules/unused-variable/)
+reportPrivateUsage = "error"
 typeCheckingMode = "basic"
>>> pyright
1004 errors, 0 warnings, 0 informations

It also means we don't leave any space for extensions to define their methods - without concern for us needing that name later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure sounds good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The least bad idea I've had was to define our stuff with a property accessor

Buuuuuut we're running quite short on useful names - given how far we've stretched Namespace 😂

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, thanks @dangotbanned

just left a question #3045 (comment), but feel free to merge when ready

@dangotbanned dangotbanned marked this pull request as draft August 30, 2025 10:22
@dangotbanned dangotbanned marked this pull request as ready for review August 30, 2025 11:05
@dangotbanned
Copy link
Member Author

Thanks for the review @MarcoGorelli!

@dangotbanned dangotbanned merged commit d318ad3 into main Aug 30, 2025
30 of 33 checks passed
@dangotbanned dangotbanned deleted the trim-compliant branch August 30, 2025 11:13
dangotbanned added a commit that referenced this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants