Skip to content

feat(DRAFT): Add get_supertype#3396

Draft
dangotbanned wants to merge 126 commits intomainfrom
dtypes/supertyping
Draft

feat(DRAFT): Add get_supertype#3396
dangotbanned wants to merge 126 commits intomainfrom
dtypes/supertyping

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Jan 10, 2026

Description

Important

@FBruzzesi and I have been + are still iterating on this
Core functionality is there, focusing on readability, performance + shrinking the test suite

This PR implements polars' concept of supertyping - which more generally defines which types can be safely promoted/demoted/cast to other types.

I really like the DuckDB visualization of their version1 of these rules, so here's that for an example:

Show Casting Operations Matrix

typecasting-matrix

This is a preliminary step for implementing relaxed concat (#3386).
The aim is we own a consistent set of rules that all/most backends can participate in.
We've already dropped some supertypes that are valid in polars, but may prove challenging in other backends such as #121.
Some others are directly mentioned in comments (e.g. (Struct, DType) -> Struct)

Additional use-cases

Supertyping in polars is used for much more than just a subset of concat.
In (#2572), it is one of the larger concepts missing from the intermediate representation (see #3386 (comment)).

polars-plan::plans::conversion::type_coercion is full of examples of how deeply related the concept is with expressions.
My aim is not to reproduce all of that 😅 - but to be able to reason about DTypes between LazyFrame operations without querying the backend for a Schema between every step 🤞

Related issues

Tasks

Footnotes

  1. DuckDB also mentions another set of rules called Combination Casting - that is entirely implicit.
    The matrix doesn't relfect these and only one cast example is given, but it would apply to nw.concat:
    "This combination casting occurs for ..., set operations (UNION / EXCEPT / INTERSECT), and ..."

Makes it much more visible which types are **really** versioned
Comment on lines +84 to +90
SameTemporalT = TypeVar("SameTemporalT", Datetime, DatetimeV1, Duration, DurationV1)
"""Temporal data types, with a `time_unit` attribute."""

SameDatetimeT = TypeVar("SameDatetimeT", Datetime, DatetimeV1)
SameT = TypeVar(
"SameT", Array, List, Struct, Datetime, DatetimeV1, Duration, DurationV1, Enum
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned I am not a big fan of these naming to be honest. The TypeVar is already suggesting that two object will be the same.

I think renaming as:

-SameTemporalT = TypeVar("SameTemporalT", Datetime, DatetimeV1, Duration, DurationV1)
+TemporalT = TypeVar("TemporalT", Datetime, DatetimeV1, Duration, DurationV1)
 """Temporal data types, with a `time_unit` attribute."""
 
-SameDatetimeT = TypeVar("SameDatetimeT", Datetime, DatetimeV1)
-SameT = TypeVar(
-    "SameT", Array, List, Struct, Datetime, DatetimeV1, Duration, DurationV1, Enum
+DatetimeT = TypeVar("DatetimeT", Datetime, DatetimeV1)
+ParametricT = TypeVar(
+    "ParametricT", Array, List, Struct, Datetime, DatetimeV1, Duration, DurationV1, Enum
 )

would be better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to convey the difference between constrained and bound TypeVars.

The only convention I've seen for constrained is like this:

AorB = TypeVar("AorB", A, B)

But it doesn't scale well to more than 2 types 😔

So I've been using Same* for this purpose.

A more verbose, but probably more accurate name would be:

EitherABCD = TypeVar("EitherABCD", A, B, C, D)

The main point is this kind of typing will reject A | B, meaning you need to have narrowed to exactly one of the constraints

- Typing needed work
- Think it should have lower priority
- The order of operands doesn't matter, they both have `.inner`
The comment can be code 😉
dangotbanned added a commit that referenced this pull request Feb 3, 2026
dangotbanned added a commit that referenced this pull request Feb 3, 2026
> and it serves a single purpose in the codebase
#3396 (comment)
Comment on lines +329 to +341
DEC128_MAX_PREC = 38
# Precomputing powers of 10 up to 10^38
POW10_LIST = tuple(10**i for i in range(DEC128_MAX_PREC + 1))
INT_MAX_MAP: Mapping[IntegerType, int] = {
UInt8(): (2**8) - 1,
UInt16(): (2**16) - 1,
UInt32(): (2**32) - 1,
UInt64(): (2**64) - 1,
Int8(): (2**7) - 1,
Int16(): (2**15) - 1,
Int32(): (2**31) - 1,
Int64(): (2**63) - 1,
}
Copy link
Member Author

@dangotbanned dangotbanned Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes on this:

I've kept this as one big comment since we're already at 77! 😳

(1) Could we be lazy-er?

I would prefer if we defer generating this until it is needed.

E.g. I'd expect _integer_supertyping and _primitive_numeric_supertyping to be more commonly used - but even they don't exist at module-import-time

(2) DType vs type[DType] keys

I think this is the only place we have instances as mapping keys, not sure why?

For example, it means each call here instantiates more DTypes, when we could just use the type itself 😅

if integer in {UInt128(), Int128()}:

(3) NumericType.max?

I had a look upstream and it seems in the direction of (#3396 (comment)) and (#3396 (comment)).

What do you think about adding these maximums to the classes, (similar to _bits)?
That way we could compare directly and probably avoid the lookup table

(4) Minor tweak

I think this is the more efficient way to do these calculations.

Note

It would be exactly the second time I've found a use-case for bitshifting operators 😂

INT_MAX_MAP: Mapping[IntegerType, int] = {
-    UInt8(): (2**8) - 1,
-    UInt16(): (2**16) - 1,
-    UInt32(): (2**32) - 1,
-    UInt64(): (2**64) - 1,
-    Int8(): (2**7) - 1,
-    Int16(): (2**15) - 1,
-    Int32(): (2**31) - 1,
-    Int64(): (2**63) - 1,
+    UInt8(): (1 << 8) - 1,
+    UInt16(): (1 << 16) - 1,
+    UInt32(): (1 << 32) - 1,
+    UInt64(): (1 << 64) - 1,
+    Int8(): (1 << 7) - 1,
+    Int16(): (1 << 15) - 1,
+    Int32(): (1 << 31) - 1,
+    Int64(): (1 << 63) - 1,
}

dangotbanned added a commit that referenced this pull request Feb 7, 2026
dangotbanned added a commit that referenced this pull request Feb 14, 2026
As much as is possible without #3396
dangotbanned added a commit that referenced this pull request Feb 16, 2026
Need to decide how many of the others to leave as todos
Main theme is needing `get_supertype` (#3396)
dangotbanned added a commit that referenced this pull request Feb 17, 2026
Everything left requires `get_supertype` (#3396)
* refactor: Replace `_same_supertype` with a custom `@singledispatch`

This is more generally useful and a LOT easier to read from the outside

* refactor: Just use a real class

* fix(typing): Satisfy `mypy`

* fix: Oops forgot the first element

* refactor(typing): Use slightly better names

* chore: Rename `default` -> `upper_bound`

* docs: Replace debugging doc

* docs: More cleanup

* refactor: Use `__slots__`, remove a field

* docs: More, more cleanup

* docs: lil bit of `.register` progress

* cov

* test: Get full coverage for `@just_dispatch`

* chore: Give it a simple repr

* test: Oops, forgot that was an override

* revert: Keep only what is required

See #3396 (comment)

* refactor: Simplify `@just_dispatch` signature

* fix(typing): Satisfy mypy

* test: Gotta get that coverage

Resolves #3410 (comment)

* docs: Restore a minimal version of `@just_dispatch` doc

Resolves #3410 (comment)

* revert: Remove `Impl` alias

#3410 (comment)

* refactor: Rename `Passthrough` -> `PassthroughFn`

Suggested in #3410 (review)

* docs: Add note to use only on internal

Suggested in #3410 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dtypes enhancement New feature or request internal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants