-
Notifications
You must be signed in to change notification settings - Fork 172
feat: Add DType.base_type
#2969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Either VSCode or `iPython` is showing me the wrong repr for classes π€¦ββοΈ
narwhals/dtypes.py
Outdated
| >>> import narwhals as nw | ||
| >>> nw.Datetime("us").base_type() | ||
| narwhals.dtypes.Datetime | ||
| <class 'narwhals.dtypes.Datetime'> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, we could have the nicer repr from polars - but would need to add a metaclass for DType.
I think this is more user friendly, given that we tret the types and objects and interchangeable in most cases
@classmethod
def base_type(cls) -> DataTypeClass:
"""
Return this DataType's fundamental/root type class.
Examples
--------
>>> pl.Datetime("ns").base_type()
Datetime
>>> pl.List(pl.Int32).base_type()
List
>>> pl.Struct([pl.Field("a", pl.Int64), pl.Field("b", pl.Boolean)]).base_type()
Struct
"""
return clsFair enough though if not having this was an intentional choice
- https://github.com/pola-rs/polars/blob/97fb4c8a49f259b3d614a6f1f3c58994bda5d7eb/py-polars/polars/datatypes/classes.py#L145-L159
- https://github.com/pola-rs/polars/blob/97fb4c8a49f259b3d614a6f1f3c58994bda5d7eb/py-polars/polars/datatypes/classes.py#L51-L58
A tweak that comes to mind is this:
import narwhals as nw
>>> nw.List
nw.List
>>> nw.List(nw.Int32)
nw.List(nw.Int32)And then we can qualify the class name a bit more for v1's Datetime, Duration and Enum only
| dtypes = Version.MAIN.dtypes | ||
| NW_TO_IBIS_DTYPES: Mapping[type[DType], IbisDataType] = { | ||
| dtypes.Float64: ibis_dtypes.Float64(), | ||
| dtypes.Float32: ibis_dtypes.Float32(), | ||
| dtypes.Binary: ibis_dtypes.Binary(), | ||
| dtypes.String: ibis_dtypes.String(), | ||
| dtypes.Boolean: ibis_dtypes.Boolean(), | ||
| dtypes.Date: ibis_dtypes.Date(), | ||
| dtypes.Time: ibis_dtypes.Time(), | ||
| dtypes.Int8: ibis_dtypes.Int8(), | ||
| dtypes.Int16: ibis_dtypes.Int16(), | ||
| dtypes.Int32: ibis_dtypes.Int32(), | ||
| dtypes.Int64: ibis_dtypes.Int64(), | ||
| dtypes.UInt8: ibis_dtypes.UInt8(), | ||
| dtypes.UInt16: ibis_dtypes.UInt16(), | ||
| dtypes.UInt32: ibis_dtypes.UInt32(), | ||
| dtypes.UInt64: ibis_dtypes.UInt64(), | ||
| dtypes.Decimal: ibis_dtypes.Decimal(), | ||
| } | ||
|
|
||
|
|
||
| def narwhals_to_native_dtype( # noqa: C901 | ||
| dtype: IntoDType, version: Version | ||
| ) -> IbisDataType: | ||
| dtypes = version.dtypes | ||
|
|
||
| if isinstance_or_issubclass(dtype, dtypes.Decimal): # pragma: no cover | ||
| return ibis_dtypes.Decimal() | ||
| if isinstance_or_issubclass(dtype, dtypes.Float64): | ||
| return ibis_dtypes.Float64() | ||
| if isinstance_or_issubclass(dtype, dtypes.Float32): | ||
| return ibis_dtypes.Float32() | ||
| if isinstance_or_issubclass(dtype, dtypes.Int128): # pragma: no cover | ||
| msg = "Int128 not supported by Ibis" | ||
| raise NotImplementedError(msg) | ||
| if isinstance_or_issubclass(dtype, dtypes.Int64): | ||
| return ibis_dtypes.Int64() | ||
| if isinstance_or_issubclass(dtype, dtypes.Int32): | ||
| return ibis_dtypes.Int32() | ||
| if isinstance_or_issubclass(dtype, dtypes.Int16): | ||
| return ibis_dtypes.Int16() | ||
| if isinstance_or_issubclass(dtype, dtypes.Int8): | ||
| return ibis_dtypes.Int8() | ||
| if isinstance_or_issubclass(dtype, dtypes.UInt128): # pragma: no cover | ||
| msg = "UInt128 not supported by Ibis" | ||
| raise NotImplementedError(msg) | ||
| if isinstance_or_issubclass(dtype, dtypes.UInt64): | ||
| return ibis_dtypes.UInt64() | ||
| if isinstance_or_issubclass(dtype, dtypes.UInt32): | ||
| return ibis_dtypes.UInt32() | ||
| if isinstance_or_issubclass(dtype, dtypes.UInt16): | ||
| return ibis_dtypes.UInt16() | ||
| if isinstance_or_issubclass(dtype, dtypes.UInt8): | ||
| return ibis_dtypes.UInt8() | ||
| if isinstance_or_issubclass(dtype, dtypes.String): | ||
| return ibis_dtypes.String() | ||
| if isinstance_or_issubclass(dtype, dtypes.Boolean): | ||
| return ibis_dtypes.Boolean() | ||
| if isinstance_or_issubclass(dtype, dtypes.Categorical): | ||
| msg = "Categorical not supported by Ibis" | ||
| raise NotImplementedError(msg) | ||
| if ibis_type := NW_TO_IBIS_DTYPES.get(dtype.base_type()): | ||
| return ibis_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the diff paints a clear enough picture
Example (String() | String)
On main, we need to go through 13x failed isinstance_or_issubclass calls, before finally reaching the correct one.
Inside that, every time 2x isinstance calls are made
_ibis.utils.narwhals_to_native_dtype(nw.String(), ...)
_ibis.utils.narwhals_to_native_dtype(nw.String, ...)Lines 843 to 848 in 9e994f3
| def isinstance_or_issubclass(obj_or_cls: Any, cls_or_tuple: Any) -> bool: | |
| from narwhals.dtypes import DType | |
| if isinstance(obj_or_cls, DType): | |
| return isinstance(obj_or_cls, cls_or_tuple) | |
| return isinstance(obj_or_cls, cls_or_tuple) or ( |
So either variation of nw.String currently always requires 26x failed isinstance calls, +2 once we get to the correct case
Optimization
Now most cases are now handled in a single dict lookup (16 in total for ibis) π₯³
We can do this for at least polars, pyarrow, ibis, duckdb - and probably with modifications in the other backends
- Unclear why very little of the *Types API* was being utilized before - Most cases can use a constant - https://duckdb.org/docs/stable/clients/python/types
- Converts previous `AttributeError` (on issubclass) into and unknown dtype - See (#2654 (comment)) - avoid using `Any` - focus on getting the dtype first
All of this is temporary until #2486
All other backends make an attempt to retrieve their time units
Hoping this makes things more visible
MarcoGorelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dangotbanned , really nice
|
Thanks @MarcoGorelli π also π at (2967c2d) |

What type of PR is this? (check all applicable)
Related issues
DType.to_<backend>()Β #2486)Checklist
If you have comments or can explain your changes, please do so below
Incredibly simple feature, but made a big refactor possible (starting from 3070470)
Based on
polars.DataType.base_typeSee (#2969 (comment)) for details