feat(python,rust) Consistent Decimal dtype parameters between Python and Rust, conversions, and inferred vs fixed scale#13650
Conversation
|
You correctly signal a lot of issues in the description of this PR, but I don't see how allowing Rather, I'd argue that I concede though that this is not my area of expertise, but that's what I'm thinking from what I've seen of it so far. |
|
I'd agree that requiring That would be a significant breaking change, in that any cast to Decimal, and any specified Decimal dtype, would require a scale. On the other hand, decimal support is experimental, and the code it would break would also be code doing scale inference that seems rather unsafe to begin with. The change would also create the awkwardness that the A problem/frustration here for Python is that Python's Decimals are an entirely different decimal representation, and don't have a fixed scale (they're also arguably a more useful representation, at least for scientific uses, but that's not that relevant here, and I think there are speed disadvantages). I think there isn't really a good way of converting lists of Python Decimals to Polars Decimals without specifying a scale. |
|
Superseded by #24542. |
This is an implementation of the suggestions in #13572. I realize this is rather premature, but as I had been playing around with it, I thought I may as well make a draft PR to show what the changes would be like.
The core idea here is to replace the current
pl.Decimal(precision: int | None = None, scale: int = 0)in Python withpl.Decimal(precision: int | None = None, scale: int | None = None), which is consistent with Rust'sDecimal(Option<usize>, Option<usize>). Doing this then allows several changes that I think are improvements in handling of conversions and inferred vs. fixed scale, allowing behaviour to be changed so that the user can more reliably choose to set a fixed scale, or have the scale inferred in a conversion.In this change:
pl.Series([D("1.5")], dtype=pl.Decimal(scale=5))has a dtype ofpl.Decimal(scale=1). After, it haspl.Decimal(scale=5).pl.Series(["1.5"]).cast(pl.Decimal)has a scale of 1, butpl.Series(["1.5"]).cast(pl.Decimal(scale=0))fails, even though intuitivelypl.Decimalwould seem equivalent topl.Decimal(precision=None, scale=0). There is no way to cast to a precision but infer scale. After,scale=None(the default) causes inference, andscale=0fails.pl.Series([1.5]).cast(pl.Decimal)silently just usesscale=0for the cast, creating aDecimal(scale=0)series of[1]. After, it fails, because there is no safe cast between a float and decimal unless scale is specified, ensuring that the user realizes this: it's unlikely they wantscale=0in most cases.pl.Series([1], dtype=pl.Decimal(scale=5))works (subject to the above disregard of scale), butpl.Series([1.5], dtype=pl.Decimal(scale=5))does not, even though both involve type conversions and the latter is not ambiguous (use float at scale 5). I'm actually not as sure this is a good idea.