Regression: scientific notation in `BigDecimal::to_string()` #135

findepi · 2024-09-25T15:41:57Z

to_scientific_notation() is supposed to produce scientific notation, to_string() used to (and IMO should continue) to produce decimal form without scientific notation.

The following test

#[test]
fn test() {
    let big_decimal = BigDecimal::from_str("0.00000000000000000000000000000000000001").unwrap()
        .normalized();
    assert_eq!(big_decimal.to_string(), "0.00000000000000000000000000000000000001");
}

Used to pass with v 0.4.2 and no longer passes with 0.4.3:

assertion `left == right` failed
  left: "1E-38"
 right: "0.00000000000000000000000000000000000001"

The text was updated successfully, but these errors were encountered:

akubera · 2024-09-26T17:35:50Z

Always using full decimal representation in the default formatter is actually not what users typically want, and can lead to attacks on memory (e.g. parsing and printing the short string "1e-1000000000" would be amplified to 1GB worth of zeros, and increases exponentially per exponential digit).

So there has to be a threshold somewhere.

The default is 5 leading zeros, which is the same as Python's decimal.Decimal:

>>> from decimal import Decimal
>>> Decimal("0.000001")  # five leading zeros is decimal formatted
Decimal('0.000001')
>>> Decimal("0.0000001")  # six+ leading zeros switches to scientific notation
Decimal('1E-7')
>>> Decimal("0.00000000000000000000000000000000000001")
Decimal('1E-38')

But you can set this at compile time with environment variables

# fractional leading zeros (default 5)
RUST_BIGDECIMAL_FMT_EXPONENTIAL_LOWER_THRESHOLD=10000

# integer trailing zeros (default 15)
RUST_BIGDECIMAL_FMT_EXPONENTIAL_UPPER_THRESHOLD=10000

findepi · 2024-09-26T18:42:00Z

Thank you @akubera for quick answer! I didn't think about DOS attack vector.
My use-case is in the SQL context where the value is known to be have at most 38 digits, so I am not concerned about to_string blowing up memory.

Thanks for the tip about the env variable. I am, however, worried a bit what happens if the library is used in my code directly and also as a transitive dependency, and portion of the code that I do not control becomes dependent on the out-of-the-box behavior. Is there an option to restore original exact to_string behavior?

FWIW Java behavior is following

var tiny = new java.math.BigDecimal("0.00000000000000000000000000000000000001");
System.out.println(tiny.toString());
System.out.println(tiny.toPlainString());

prints

1E-38
0.00000000000000000000000000000000000001

akubera · 2024-09-26T20:03:50Z

I could expose the fn format_full_scale() function as the methods:

fn to_plain_string(&self) -> String
fn write_plain_string<W: fmt::Write>(&self, wtr: &mut W) -> fmt::Result;

if the name "plain_string" makes sense (which is fine by me, alternative off the top of my head would be to_decimal_string()).

Python doesn't look like it has an equivalent, but numpy format_float_positional (and format_float_scientific).
Ruby has (deprecated) method to_digits (which I don't like; they're all "digits", right?)

I'm not sure how the transitive dependencies work, but I have a feeling there's no way to "pass" an environment variable from one dependency to another. The top level library would need to know to set the RUST_DECIMAL_* vars.

One option would be add a feature to disable the safe formatting:

bigdecimal  = { ... , features = [ "full_decimal_format" ] }   // or scarier name like "unprotected_decimal_format"

But even then I'm not sure how transitive dependencies work: if the top level project includes bigdecimal without the feature, but a dependency does, would it be compiled twice and the numbers are formatted differently depending on their source?

I've been told the environment variables are bad for reproducible builds, with transitive dependencies, but I'm not sure what my options are.

akubera · 2024-09-26T20:08:15Z

🤔

Maybe you could put a check in your build.rs which requires the environment variable to be present and >=38 to correctly build?

akubera · 2024-09-26T20:24:27Z

That's a big hoop for users to jump through.

Maybe a new method where you explicitly set the leading/leading zeros?
Are you SURE there's no potential for attack?

Something like:

use bigdecimal::{BigDecimal, BigDecimalRef};

struct MyBigDecimal {
   value: BigDecimal,
}

impl Display for MyBigDecimal {
   fn fmt(&self, f: ...) -> ... {
       let max_leading_zeros = 38;  // or 100 or 1000 or whatever.
       let max_trailing_zeros = 100;

       // new method overriding to_string()
       self.value.fmt_plain_decimal(f, max_leading_zeros, max_trailing_zeros);
   }
}

// This should make casting your objects to a form that most functions 
// in bigdecimal accept, if you're doing any math.
impl<'a> From<&'a MyBigDecimal> for BigDecimalRef<'a> {
  fn from(d: &MyBigDecimal) -> Self {
    d.value.to_ref()
  }
}

findepi · 2024-09-27T06:55:57Z

I could expose the fn format_full_scale() function as the methods:
fn to_plain_string(&self) -> String
fn write_plain_string<W: fmt::Write>(&self, wtr: &mut W) -> fmt::Result;

That would be awesome, thank you!

akubera · 2024-09-29T18:40:55Z

I should mention I’m not able to work on it this weekend. Maybe next weekend or the following. Don't think I have forgotten.

akubera · 2024-10-29T12:57:17Z

I merged these functions into trunk last night.

00a97d6

I'm not sure if I want fmt::Write or io::Write, or both.

findepi · 2024-10-31T10:34:01Z

I merged these functions into trunk last night.

00a97d6

Looks great, thank you!

akubera · 2024-12-09T00:10:26Z

Finally released 0.4.7 today, which includes these two methods.

findepi mentioned this issue Dec 2, 2024

chore(deps): update bigdecimal from 0.4.1 to 0.4.6 apache/datafusion#13569

Merged

akubera closed this as completed Dec 9, 2024

findepi mentioned this issue Dec 12, 2024

Update to bigdecimal 0.4.7 apache/datafusion#13747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: scientific notation in `BigDecimal::to_string()` #135

Regression: scientific notation in `BigDecimal::to_string()` #135

findepi commented Sep 25, 2024

akubera commented Sep 26, 2024 •

edited

Loading

findepi commented Sep 26, 2024

akubera commented Sep 26, 2024

akubera commented Sep 26, 2024

akubera commented Sep 26, 2024

findepi commented Sep 27, 2024

akubera commented Sep 29, 2024

akubera commented Oct 29, 2024

findepi commented Oct 31, 2024

akubera commented Dec 9, 2024

Regression: scientific notation in BigDecimal::to_string() #135

Regression: scientific notation in BigDecimal::to_string() #135

Comments

findepi commented Sep 25, 2024

akubera commented Sep 26, 2024 • edited Loading

findepi commented Sep 26, 2024

akubera commented Sep 26, 2024

akubera commented Sep 26, 2024

akubera commented Sep 26, 2024

findepi commented Sep 27, 2024

akubera commented Sep 29, 2024

akubera commented Oct 29, 2024

findepi commented Oct 31, 2024

akubera commented Dec 9, 2024

Regression: scientific notation in `BigDecimal::to_string()` #135

Regression: scientific notation in `BigDecimal::to_string()` #135

akubera commented Sep 26, 2024 •

edited

Loading