-
Notifications
You must be signed in to change notification settings - Fork 171
chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
#3266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
IntoSchema typing, allow Sequence[tuple[str, IntoDType]IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]]
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned I hope you are proud 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
But now to ruin your day slightly ...
We got our typing for Schema.__init__ from polars, but even this is too strict.
Here's what typeshed has for dict, which we support (except **kwargs)
Luckily we don't need to use any overloads, but we should have this in IntoSchema somewhere:
Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting timing though, I just pushed a test where I passed a non-Mapping object to nw.Schema 😂
narwhals/tests/plan/schema_test.py
Lines 18 to 20 in 1579b32
| # NOTE: Would type-check if `Schema.__init__` didn't make liskov unhappy | |
| assert schema == nw.Schema(frozen_schema) # type: ignore[arg-type] | |
| assert mapping == dict(frozen_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dangotbanned
Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
SupportsKeysAndGetItemwhat the hell is this 😂 how's Mapping not enough anymore?- Regarding
Iterable, I always have a hard time being comfortable with it, since it can be infinite (and lazy).- How does infinite converts to a dict?
- Lazy makes it a bit more of a headache when we want to check it's content (e.g. first element is a string or a tuple), but that's alright
- This is the reason why I went with
Sequence, although polarsSchema.__init__signature is with theIterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got our typing for Schema.init from polars, but even this is too strict.
can we resolve that later and just get the type[Dtype] / Dtype part sorted out in this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SupportsKeysAndGetItemwhat the hell is this 😂 how's Mapping not enough anymore?
Lol, it is an object that supports those methods ofc 😉
just import it in a TYPE_CHECKING block from here
from _typeshed import SupportsKeysAndGetItemThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite
If someone passes an infinite stream to an initializer - they need to learn the lesson that the call will never complete
(and lazy).
This is a desirable property for the caller and I think if we want to validate things, then that is our responsibility
Reducing scope
How about aligning just Schema.__init__ with dict.__init__?
We are only checking for None there, so the issues you have seem to not apply there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned attempt in 9ce61fe, which now fails for mypy when the dict has mix of initialized and un-initialized dtypes, which was the initial purpose of the PR 🤔
Edit: By changing Iterable into Sequence there are no issue (see 0e0d7a8). I am personally not interested in diving deeper than this.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomly removing some initialization to test the typechecking
narwhals/typing.py
Outdated
| IntoNullableSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType | None] | Sequence[tuple[str, IntoDType | None]]" | ||
| ) | ||
| """Schema specification with possible None values.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Please let's come up with a better description
| return (value,) * n_match if isinstance(value, bool) else tuple(value) | ||
|
|
||
|
|
||
| class NullableSchema(OrderedDict[str, "IntoDType | None"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for this class is mostly two folded:
- Use as a utility to convert a
Sequence[tuple[str, DType {|None}]]into a mapping. Hence make it easy to use the same API (key, dtype in obj.items()). - Easily set a flag to know if any value passed is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name Nullable* is making me think this is related to #3176 (comment) again 🫣
I'm not opposed to having classes to make our internal API cleaner btw 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name Nullable* is making me think this is related to #3176 (comment) again 🫣
Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.
I'm not opposed to having classes to make our internal API cleaner btw 👍
Any preference in having it prefixed by _?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any preference in having it prefixed by
_?
It is already in _utils, so no need for a prefix 🙂
The current _ names are an artifact from when we had everything in utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.
I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.
Linking to (#3176 (comment)) was supposed to show that I made this mistake already 😂
|
Regarding CI failures: the only one related to this PR seem to be tubular 👀 |
IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]]IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
narwhals/typing.py
Outdated
| "IntoFrameT", | ||
| "IntoLazyFrame", | ||
| "IntoLazyFrameT", | ||
| "IntoNullableSchema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure about making this a public type?
Is it possible to only focus on Dtype -> intodtype changes here without introducing new public types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I honestly think there might be something wrong in the PR to begin with. If I refocus on the basic changes to fix #3257, then passing a dictionary with mixed initialized and un-initialized dtypes does still complain (pyright is ok with it, mypy is not)
What type of PR is this? (check all applicable)
Related issues
typing.IntoSchematoo narrow #3257Checklist
If you have comments or can explain your changes, please do so below