Conversation
Much easier to pick one to debug this way
We can safely use an unbounded `@cache`, because there can only be 16 valid pairs
Makes it much more visible which types are **really** versioned
narwhals/dtypes/_supertyping.py
Outdated
| SameTemporalT = TypeVar("SameTemporalT", Datetime, DatetimeV1, Duration, DurationV1) | ||
| """Temporal data types, with a `time_unit` attribute.""" | ||
|
|
||
| SameDatetimeT = TypeVar("SameDatetimeT", Datetime, DatetimeV1) | ||
| SameT = TypeVar( | ||
| "SameT", Array, List, Struct, Datetime, DatetimeV1, Duration, DurationV1, Enum | ||
| ) |
There was a problem hiding this comment.
@dangotbanned I am not a big fan of these naming to be honest. The TypeVar is already suggesting that two object will be the same.
I think renaming as:
-SameTemporalT = TypeVar("SameTemporalT", Datetime, DatetimeV1, Duration, DurationV1)
+TemporalT = TypeVar("TemporalT", Datetime, DatetimeV1, Duration, DurationV1)
"""Temporal data types, with a `time_unit` attribute."""
-SameDatetimeT = TypeVar("SameDatetimeT", Datetime, DatetimeV1)
-SameT = TypeVar(
- "SameT", Array, List, Struct, Datetime, DatetimeV1, Duration, DurationV1, Enum
+DatetimeT = TypeVar("DatetimeT", Datetime, DatetimeV1)
+ParametricT = TypeVar(
+ "ParametricT", Array, List, Struct, Datetime, DatetimeV1, Duration, DurationV1, Enum
)would be better
There was a problem hiding this comment.
I'm trying to convey the difference between constrained and bound TypeVars.
- https://typing.python.org/en/latest/spec/generics.html#introduction
- https://typing.python.org/en/latest/spec/generics.html#type-variables-with-an-upper-bound
The only convention I've seen for constrained is like this:
AorB = TypeVar("AorB", A, B)But it doesn't scale well to more than 2 types 😔
So I've been using Same* for this purpose.
A more verbose, but probably more accurate name would be:
EitherABCD = TypeVar("EitherABCD", A, B, C, D)The main point is this kind of typing will reject A | B, meaning you need to have narrowed to exactly one of the constraints
for more information, see https://pre-commit.ci
- Typing needed work - Think it should have lower priority - The order of operands doesn't matter, they both have `.inner`
The comment can be code 😉
> and it serves a single purpose in the codebase #3396 (comment)
| DEC128_MAX_PREC = 38 | ||
| # Precomputing powers of 10 up to 10^38 | ||
| POW10_LIST = tuple(10**i for i in range(DEC128_MAX_PREC + 1)) | ||
| INT_MAX_MAP: Mapping[IntegerType, int] = { | ||
| UInt8(): (2**8) - 1, | ||
| UInt16(): (2**16) - 1, | ||
| UInt32(): (2**32) - 1, | ||
| UInt64(): (2**64) - 1, | ||
| Int8(): (2**7) - 1, | ||
| Int16(): (2**15) - 1, | ||
| Int32(): (2**31) - 1, | ||
| Int64(): (2**63) - 1, | ||
| } |
There was a problem hiding this comment.
A few notes on this:
I've kept this as one big comment since we're already at 77! 😳
(1) Could we be lazy-er?
I would prefer if we defer generating this until it is needed.
E.g. I'd expect _integer_supertyping and _primitive_numeric_supertyping to be more commonly used - but even they don't exist at module-import-time
(2) DType vs type[DType] keys
I think this is the only place we have instances as mapping keys, not sure why?
For example, it means each call here instantiates more DTypes, when we could just use the type itself 😅
narwhals/narwhals/dtypes/_supertyping.py
Line 355 in 548e5b8
(3) NumericType.max?
I had a look upstream and it seems in the direction of (#3396 (comment)) and (#3396 (comment)).
What do you think about adding these maximums to the classes, (similar to _bits)?
That way we could compare directly and probably avoid the lookup table
(4) Minor tweak
I think this is the more efficient way to do these calculations.
Note
It would be exactly the second time I've found a use-case for bitshifting operators 😂
INT_MAX_MAP: Mapping[IntegerType, int] = {
- UInt8(): (2**8) - 1,
- UInt16(): (2**16) - 1,
- UInt32(): (2**32) - 1,
- UInt64(): (2**64) - 1,
- Int8(): (2**7) - 1,
- Int16(): (2**15) - 1,
- Int32(): (2**31) - 1,
- Int64(): (2**63) - 1,
+ UInt8(): (1 << 8) - 1,
+ UInt16(): (1 << 16) - 1,
+ UInt32(): (1 << 32) - 1,
+ UInt64(): (1 << 64) - 1,
+ Int8(): (1 << 7) - 1,
+ Int16(): (1 << 15) - 1,
+ Int32(): (1 << 31) - 1,
+ Int64(): (1 << 63) - 1,
}As much as is possible without #3396
Need to decide how many of the others to leave as todos Main theme is needing `get_supertype` (#3396)
Everything left requires `get_supertype` (#3396)
* refactor: Replace `_same_supertype` with a custom `@singledispatch` This is more generally useful and a LOT easier to read from the outside * refactor: Just use a real class * fix(typing): Satisfy `mypy` * fix: Oops forgot the first element * refactor(typing): Use slightly better names * chore: Rename `default` -> `upper_bound` * docs: Replace debugging doc * docs: More cleanup * refactor: Use `__slots__`, remove a field * docs: More, more cleanup * docs: lil bit of `.register` progress * cov * test: Get full coverage for `@just_dispatch` * chore: Give it a simple repr * test: Oops, forgot that was an override * revert: Keep only what is required See #3396 (comment) * refactor: Simplify `@just_dispatch` signature * fix(typing): Satisfy mypy * test: Gotta get that coverage Resolves #3410 (comment) * docs: Restore a minimal version of `@just_dispatch` doc Resolves #3410 (comment) * revert: Remove `Impl` alias #3410 (comment) * refactor: Rename `Passthrough` -> `PassthroughFn` Suggested in #3410 (review) * docs: Add note to use only on internal Suggested in #3410 (review)
Description
Important
@FBruzzesi and I have been + are still iterating on this
Core functionality is there, focusing on readability, performance + shrinking the test suite
This PR implements
polars' concept of supertyping - which more generally defines which types can be safely promoted/demoted/cast to other types.I really like the DuckDB visualization of their version1 of these rules, so here's that for an example:
Show Casting Operations Matrix
This is a preliminary step for implementing relaxed
concat(#3386).The aim is we own a consistent set of rules that all/most backends can participate in.
We've already dropped some supertypes that are valid in
polars, but may prove challenging in other backends such as #121.Some others are directly mentioned in comments (e.g.
(Struct, DType) -> Struct)Additional use-cases
Supertyping in
polarsis used for much more than just a subset ofconcat.In (#2572), it is one of the larger concepts missing from the intermediate representation (see #3386 (comment)).
polars-plan::plans::conversion::type_coercionis full of examples of how deeply related the concept is with expressions.My aim is not to reproduce all of that 😅 - but to be able to reason about
DTypes betweenLazyFrameoperations without querying the backend for aSchemabetween every step 🤞Related issues
concat(..., how={"vertical_relaxed", "diagonal_relaxed"})#3386DType.__call__#3393DecimalDType #3377Tasks
NOTSET(244537d)443d9ccd)StructDTypefixtures(v1.Datetime, Datetime) -> None_CACHE_SIZE_TP_HIGHnarwhals.dtypes.classes->narwhals.dtypes._classes(d2c96fe)narwhals.stable.v1._dtypes->narwhals.dtypes._classes_v1_has_intersection_first_excludingDecimalhandling #3377Stringdowncasts (see thread) 8d9e053(Nested, String)?{List, Array} -> List3ad1639promotion-rules.mdscriptpromotion-rules.mdget_supertype#3396 (comment)get_supertype#3396 (comment)_mixed_supertypecomments_numeric_supertypecommentsStructconcat(..., how="*_relaxed"})#3398@lru_cacheon a wrapper forDType.__eq___struct_fields_unionandget_supertypeFootnotes
DuckDB also mentions another set of rules called Combination Casting - that is entirely implicit.
The matrix doesn't relfect these and only one cast example is given, but it would apply to
nw.concat:"This combination casting occurs for ..., set operations (
UNION/EXCEPT/INTERSECT), and ..." ↩