Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Aug 10, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Incredibly simple feature, but made a big refactor possible (starting from 3070470)

Based on polars.DataType.base_type

See (#2969 (comment)) for details

@dangotbanned dangotbanned added enhancement New feature or request dtypes labels Aug 10, 2025
Either VSCode or `iPython` is showing me the wrong repr for classes πŸ€¦β€β™‚οΈ
Comment on lines 74 to 76
>>> import narwhals as nw
>>> nw.Datetime("us").base_type()
narwhals.dtypes.Datetime
<class 'narwhals.dtypes.Datetime'>
Copy link
Member Author

@dangotbanned dangotbanned Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, we could have the nicer repr from polars - but would need to add a metaclass for DType.
I think this is more user friendly, given that we tret the types and objects and interchangeable in most cases

    @classmethod
    def base_type(cls) -> DataTypeClass:
        """
        Return this DataType's fundamental/root type class.

        Examples
        --------
        >>> pl.Datetime("ns").base_type()
        Datetime
        >>> pl.List(pl.Int32).base_type()
        List
        >>> pl.Struct([pl.Field("a", pl.Int64), pl.Field("b", pl.Boolean)]).base_type()
        Struct
        """
        return cls

Fair enough though if not having this was an intentional choice


A tweak that comes to mind is this:

import narwhals as nw

>>> nw.List
nw.List

>>> nw.List(nw.Int32)
nw.List(nw.Int32)

And then we can qualify the class name a bit more for v1's Datetime, Duration and Enum only

Comment on lines 176 to 202
dtypes = Version.MAIN.dtypes
NW_TO_IBIS_DTYPES: Mapping[type[DType], IbisDataType] = {
dtypes.Float64: ibis_dtypes.Float64(),
dtypes.Float32: ibis_dtypes.Float32(),
dtypes.Binary: ibis_dtypes.Binary(),
dtypes.String: ibis_dtypes.String(),
dtypes.Boolean: ibis_dtypes.Boolean(),
dtypes.Date: ibis_dtypes.Date(),
dtypes.Time: ibis_dtypes.Time(),
dtypes.Int8: ibis_dtypes.Int8(),
dtypes.Int16: ibis_dtypes.Int16(),
dtypes.Int32: ibis_dtypes.Int32(),
dtypes.Int64: ibis_dtypes.Int64(),
dtypes.UInt8: ibis_dtypes.UInt8(),
dtypes.UInt16: ibis_dtypes.UInt16(),
dtypes.UInt32: ibis_dtypes.UInt32(),
dtypes.UInt64: ibis_dtypes.UInt64(),
dtypes.Decimal: ibis_dtypes.Decimal(),
}


def narwhals_to_native_dtype( # noqa: C901
dtype: IntoDType, version: Version
) -> IbisDataType:
dtypes = version.dtypes

if isinstance_or_issubclass(dtype, dtypes.Decimal): # pragma: no cover
return ibis_dtypes.Decimal()
if isinstance_or_issubclass(dtype, dtypes.Float64):
return ibis_dtypes.Float64()
if isinstance_or_issubclass(dtype, dtypes.Float32):
return ibis_dtypes.Float32()
if isinstance_or_issubclass(dtype, dtypes.Int128): # pragma: no cover
msg = "Int128 not supported by Ibis"
raise NotImplementedError(msg)
if isinstance_or_issubclass(dtype, dtypes.Int64):
return ibis_dtypes.Int64()
if isinstance_or_issubclass(dtype, dtypes.Int32):
return ibis_dtypes.Int32()
if isinstance_or_issubclass(dtype, dtypes.Int16):
return ibis_dtypes.Int16()
if isinstance_or_issubclass(dtype, dtypes.Int8):
return ibis_dtypes.Int8()
if isinstance_or_issubclass(dtype, dtypes.UInt128): # pragma: no cover
msg = "UInt128 not supported by Ibis"
raise NotImplementedError(msg)
if isinstance_or_issubclass(dtype, dtypes.UInt64):
return ibis_dtypes.UInt64()
if isinstance_or_issubclass(dtype, dtypes.UInt32):
return ibis_dtypes.UInt32()
if isinstance_or_issubclass(dtype, dtypes.UInt16):
return ibis_dtypes.UInt16()
if isinstance_or_issubclass(dtype, dtypes.UInt8):
return ibis_dtypes.UInt8()
if isinstance_or_issubclass(dtype, dtypes.String):
return ibis_dtypes.String()
if isinstance_or_issubclass(dtype, dtypes.Boolean):
return ibis_dtypes.Boolean()
if isinstance_or_issubclass(dtype, dtypes.Categorical):
msg = "Categorical not supported by Ibis"
raise NotImplementedError(msg)
if ibis_type := NW_TO_IBIS_DTYPES.get(dtype.base_type()):
return ibis_type
Copy link
Member Author

@dangotbanned dangotbanned Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the diff paints a clear enough picture

Example (String() | String)

On main, we need to go through 13x failed isinstance_or_issubclass calls, before finally reaching the correct one.

Inside that, every time 2x isinstance calls are made

_ibis.utils.narwhals_to_native_dtype(nw.String(), ...)
_ibis.utils.narwhals_to_native_dtype(nw.String, ...)

narwhals/narwhals/_utils.py

Lines 843 to 848 in 9e994f3

def isinstance_or_issubclass(obj_or_cls: Any, cls_or_tuple: Any) -> bool:
from narwhals.dtypes import DType
if isinstance(obj_or_cls, DType):
return isinstance(obj_or_cls, cls_or_tuple)
return isinstance(obj_or_cls, cls_or_tuple) or (

So either variation of nw.String currently always requires 26x failed isinstance calls, +2 once we get to the correct case

Optimization

Now most cases are now handled in a single dict lookup (16 in total for ibis) πŸ₯³

We can do this for at least polars, pyarrow, ibis, duckdb - and probably with modifications in the other backends

@dangotbanned dangotbanned marked this pull request as ready for review August 13, 2025 17:26
@dangotbanned dangotbanned marked this pull request as draft August 13, 2025 17:29
Hoping this makes things more visible
@dangotbanned dangotbanned marked this pull request as ready for review August 13, 2025 17:54
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dangotbanned , really nice

@dangotbanned
Copy link
Member Author

Thanks @MarcoGorelli 😍

also πŸ˜‚ at (2967c2d)

@dangotbanned dangotbanned merged commit 64472d2 into main Aug 13, 2025
32 of 33 checks passed
@dangotbanned dangotbanned deleted the dtype-base-type branch August 13, 2025 18:47
dangotbanned added a commit that referenced this pull request Sep 18, 2025
- Full revert of #2731
- Not needed since #2969

Now, we only have 1 case which uses the 2x `@overload`(s) but I'm keeping the 3x as well for now πŸ™‚
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants