Skip to content

Conversation

@FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@FBruzzesi FBruzzesi added enhancement New feature or request typing labels Oct 31, 2025
@FBruzzesi FBruzzesi mentioned this pull request Oct 31, 2025
10 tasks
@FBruzzesi FBruzzesi changed the title chore: Fix IntoSchema typing, allow Sequence[tuple[str, IntoDType] chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] Oct 31, 2025
Comment on lines -306 to +307
# TODO @dangotbanned: fix this?
# Constructor allows tuples, but we don't support that *everywhere* yet
IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema"
IntoSchema: TypeAlias = (
"Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned I hope you are proud 😇

Copy link
Member

@dangotbanned dangotbanned Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

But now to ruin your day slightly ...

We got our typing for Schema.__init__ from polars, but even this is too strict.
Here's what typeshed has for dict, which we support (except **kwargs)

https://github.com/python/typeshed/blob/bf7214784877c52638844c065360d4814fae4c65/stdlib/builtins.pyi#L1158-L1187

Luckily we don't need to use any overloads, but we should have this in IntoSchema somewhere:

Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting timing though, I just pushed a test where I passed a non-Mapping object to nw.Schema 😂

# NOTE: Would type-check if `Schema.__init__` didn't make liskov unhappy
assert schema == nw.Schema(frozen_schema) # type: ignore[arg-type]
assert mapping == dict(frozen_schema)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dangotbanned

Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
  • SupportsKeysAndGetItem what the hell is this 😂 how's Mapping not enough anymore?
  • Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite (and lazy).
    • How does infinite converts to a dict?
    • Lazy makes it a bit more of a headache when we want to check it's content (e.g. first element is a string or a tuple), but that's alright
    • This is the reason why I went with Sequence, although polars Schema.__init__ signature is with the Iterable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got our typing for Schema.init from polars, but even this is too strict.

can we resolve that later and just get the type[Dtype] / Dtype part sorted out in this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SupportsKeysAndGetItem what the hell is this 😂 how's Mapping not enough anymore?

Lol, it is an object that supports those methods ofc 😉

just import it in a TYPE_CHECKING block from here

https://github.com/python/typeshed/blob/be34e9201db75891a67d4d3ce5e5705ee6636f6f/stdlib/_typeshed/__init__.pyi#L161-L164

from _typeshed import SupportsKeysAndGetItem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite

If someone passes an infinite stream to an initializer - they need to learn the lesson that the call will never complete

(and lazy).

This is a desirable property for the caller and I think if we want to validate things, then that is our responsibility

Reducing scope

How about aligning just Schema.__init__ with dict.__init__?

We are only checking for None there, so the issues you have seem to not apply there

Copy link
Member Author

@FBruzzesi FBruzzesi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned attempt in 9ce61fe, which now fails for mypy when the dict has mix of initialized and un-initialized dtypes, which was the initial purpose of the PR 🤔

Edit: By changing Iterable into Sequence there are no issue (see 0e0d7a8). I am personally not interested in diving deeper than this.

Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Randomly removing some initialization to test the typechecking

IntoNullableSchema: TypeAlias = (
"Mapping[str, IntoDType | None] | Sequence[tuple[str, IntoDType | None]]"
)
"""Schema specification with possible None values."""
Copy link
Member Author

@FBruzzesi FBruzzesi Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Please let's come up with a better description

return (value,) * n_match if isinstance(value, bool) else tuple(value)


class NullableSchema(OrderedDict[str, "IntoDType | None"]):
Copy link
Member Author

@FBruzzesi FBruzzesi Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for this class is mostly two folded:

  • Use as a utility to convert a Sequence[tuple[str, DType {|None}]] into a mapping. Hence make it easy to use the same API (key, dtype in obj.items()).
  • Easily set a flag to know if any value passed is None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name Nullable* is making me think this is related to #3176 (comment) again 🫣

I'm not opposed to having classes to make our internal API cleaner btw 👍

Copy link
Member Author

@FBruzzesi FBruzzesi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name Nullable* is making me think this is related to #3176 (comment) again 🫣

Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.

I'm not opposed to having classes to make our internal API cleaner btw 👍

Any preference in having it prefixed by _?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any preference in having it prefixed by _?

It is already in _utils, so no need for a prefix 🙂

The current _ names are an artifact from when we had everything in utils.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.

I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.
Linking to (#3176 (comment)) was supposed to show that I made this mistake already 😂

@FBruzzesi
Copy link
Member Author

Regarding CI failures: the only one related to this PR seem to be tubular 👀

@FBruzzesi FBruzzesi changed the title chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] chore: Fix IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType] Nov 1, 2025
@FBruzzesi FBruzzesi changed the title chore: Fix IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType] chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType] Nov 1, 2025
"IntoFrameT",
"IntoLazyFrame",
"IntoLazyFrameT",
"IntoNullableSchema",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about making this a public type?

Is it possible to only focus on Dtype -> intodtype changes here without introducing new public types?

Copy link
Member Author

@FBruzzesi FBruzzesi Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I honestly think there might be something wrong in the PR to begin with. If I refocus on the basic changes to fix #3257, then passing a dictionary with mixed initialized and un-initialized dtypes does still complain (pyright is ok with it, mypy is not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request typing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typing.IntoSchema too narrow

4 participants