Skip to content

feat(python,rust) Consistent Decimal dtype parameters between Python and Rust, conversions, and inferred vs fixed scale#13650

Closed
cgevans wants to merge 2 commits intopola-rs:mainfrom
cgevans:decimal-scale-none-python
Closed

feat(python,rust) Consistent Decimal dtype parameters between Python and Rust, conversions, and inferred vs fixed scale#13650
cgevans wants to merge 2 commits intopola-rs:mainfrom
cgevans:decimal-scale-none-python

Conversation

@cgevans
Copy link
Copy Markdown
Contributor

@cgevans cgevans commented Jan 11, 2024

This is an implementation of the suggestions in #13572. I realize this is rather premature, but as I had been playing around with it, I thought I may as well make a draft PR to show what the changes would be like.

The core idea here is to replace the current pl.Decimal(precision: int | None = None, scale: int = 0) in Python with pl.Decimal(precision: int | None = None, scale: int | None = None), which is consistent with Rust's Decimal(Option<usize>, Option<usize>). Doing this then allows several changes that I think are improvements in handling of conversions and inferred vs. fixed scale, allowing behaviour to be changed so that the user can more reliably choose to set a fixed scale, or have the scale inferred in a conversion.

In this change:

  • Before, pl.Series([D("1.5")], dtype=pl.Decimal(scale=5)) has a dtype of pl.Decimal(scale=1). After, it has pl.Decimal(scale=5).
  • Before, pl.Series(["1.5"]).cast(pl.Decimal) has a scale of 1, but pl.Series(["1.5"]).cast(pl.Decimal(scale=0)) fails, even though intuitively pl.Decimal would seem equivalent to pl.Decimal(precision=None, scale=0). There is no way to cast to a precision but infer scale. After, scale=None (the default) causes inference, and scale=0 fails.
  • Before, pl.Series([1.5]).cast(pl.Decimal) silently just uses scale=0 for the cast, creating a Decimal(scale=0) series of [1]. After, it fails, because there is no safe cast between a float and decimal unless scale is specified, ensuring that the user realizes this: it's unlikely they want scale=0 in most cases.
  • Before pl.Series([1], dtype=pl.Decimal(scale=5)) works (subject to the above disregard of scale), but pl.Series([1.5], dtype=pl.Decimal(scale=5)) does not, even though both involve type conversions and the latter is not ambiguous (use float at scale 5). I'm actually not as sure this is a good idea.

@github-actions github-actions Bot added the enhancement New feature or an improvement of an existing feature label Jan 11, 2024
@cgevans cgevans marked this pull request as ready for review January 20, 2024 00:52
@stinodego
Copy link
Copy Markdown
Contributor

stinodego commented Jan 20, 2024

You correctly signal a lot of issues in the description of this PR, but I don't see how allowing scale=None fixes these. A lot of these can be fixed simply by fixing bugs.

Rather, I'd argue that scale should not be optional on the Rust side either. A decimal without a scale makes no sense as you wouldn't know what number you're looking at. In fact, you cannot do any operations on a Decimal without a known scale in Rust because you cannot convert it from Int128Chunked back into a Decimal without setting a scale.

I concede though that this is not my area of expertise, but that's what I'm thinking from what I've seen of it so far.

@cgevans
Copy link
Copy Markdown
Contributor Author

cgevans commented Jan 20, 2024

I'd agree that requiring scale in both Python and Rust would make more sense. An alternative arrangement might be to make scale required (and don't give it any default) in both Python and Rust, and never do scale inference, except if it is requested in a to_decimal method. I'm not really sure that automatic scale inference is a good idea: it's the source of many unexpected behaviors.

That would be a significant breaking change, in that any cast to Decimal, and any specified Decimal dtype, would require a scale. On the other hand, decimal support is experimental, and the code it would break would also be code doing scale inference that seems rather unsafe to begin with.

The change would also create the awkwardness that the precision parameter comes before scale, but doesn't appear to be meaningful or useful. The underlying precision is always 38, and the parameter is mostly ignored, except in some cases to cause failures in binary operations if specified precisions don't match, even though the actual precisions do. So precision should have a default (and I'd suggest it would might more sense to have it be 38 than None) if it is kept at all, but that would then put a required parameter after an optional one... and if I recall correctly, the precision, scale order is there because there would otherwise be confusion with other non-Polars implementations.

A problem/frustration here for Python is that Python's Decimals are an entirely different decimal representation, and don't have a fixed scale (they're also arguably a more useful representation, at least for scientific uses, but that's not that relevant here, and I think there are speed disadvantages). I think there isn't really a good way of converting lists of Python Decimals to Polars Decimals without specifying a scale.

@alexander-beedie alexander-beedie added the A-dtype-decimal Area: decimal data type label Jan 23, 2024
@stinodego stinodego requested a review from reswqa as a code owner June 28, 2024 17:53
@orlp
Copy link
Copy Markdown
Member

orlp commented Sep 19, 2025

Superseded by #24542.

@orlp orlp closed this Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-dtype-decimal Area: decimal data type enhancement New feature or an improvement of an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants