Consider adding a section on README elaborating on what roundtrip precisely means #25

rpopescu · 2021-12-10T17:00:27Z

Hi @jk-jeon, and thank you for your amazing work on this!
This isn't exactly an issue, but rather a question or a request for clarification of the documentation.
The documentation says the algorithm is round-trip.
To check this my understand of this, I have written a simple loop whose index "i" iterates all uint64_t values and:

obtains a double "orig" by bit_cast-ing the index
creates decimal "dec" from "orig" via dragonbox;
obtains another double called "back" via decimal.significand * pow(10.0, decimal.exponent), and then inverting it if negative;
compares the two doubles, and prints an error message if they're different;
the error message also includes a bit_cast-ed uint64_t from the "back" double, called "back_i" in the output below.

Here's a sample of its output; formatting is done with an old version of fmtlib (5.3.0):

i=4716101740380815687 orig=1.98801e+07 dec=(1988012400000122, -8, +) back=1.98801e+07 back_i=4716101740380815688
i=4716101740380815688 orig=1.98801e+07 dec=(19880124000001222, -9, +) back=1.98801e+07 back_i=4716101740380815689
i=4716101740380815699 orig=1.98801e+07 dec=(19880124000001263, -9, +) back=1.98801e+07 back_i=4716101740380815700
i=4716101740380815700 orig=1.98801e+07 dec=(19880124000001267, -9, +) back=1.98801e+07 back_i=4716101740380815701
i=4716101740380815703 orig=1.98801e+07 dec=(19880124000001278, -9, +) back=1.98801e+07 back_i=4716101740380815704
i=4716101740380815713 orig=1.98801e+07 dec=(19880124000001315, -9, +) back=1.98801e+07 back_i=4716101740380815714

I haven't verified this programatically but I notice that the back integer representation is one off the original integer from which the double was created. Note that the exact code computing "back" from "dec" is:

double back = static_cast<double>(dec.significand) * pow(10.0, static_cast<double>(decimal.exponent));

My questions are:

what am I doing wrong, if anything, in converting from the decimal to a double? I have a feeling that the answer is "everything"...
would the rounding policy have anything to do with off-by-one observation above?

Thank you kindly in advance for your time.

The text was updated successfully, but these errors were encountered:

Alcaro · 2021-12-10T17:13:36Z

Converting double to shortest accurate string is a difficult task.

Converting string to double is just as difficult. 19880124000001263 is not exactly representable as double, and neither is 10^-9; 19880124000001263.0 * 1.0e-9 rounds to 19880124.000001267, not 19880124.000001263.

You'll get it a bit more accurate if you divide by 10^9 rather than multiply by 10^-9, but significand accuracy will still be a problem for some inputs.

If you're on Linux, glibc strtod/sscanf is perfectly accurate for every input; if on Windows, try MSVC std::from_chars.

rpopescu · 2021-12-10T17:16:53Z

Thanks for your reply @Alcaro .
Note that my question doesn't involve strings; instead it's about going from the decimal struct returned by jkj::dragonbox::to_decimal to a double.
I'll try the division.
(later edit):
I have, computing an additional double called back2 and printing it out (and its integer bit_cast value) if neither back nor back2 equal orig. The code is
double back2 = static_cast<double>(dec.significand) / pow(10.0, static_cast<double>(-dec.exponent));
It looks like this:

i=4716101740380817013 orig=1.98801e+07 dec=(19880124000006158, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817014 back2_i=4716101740380817014
i=4716101740380817028 orig=1.98801e+07 dec=(19880124000006214, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817029 back2_i=4716101740380817029
i=4716101740380817056 orig=1.98801e+07 dec=(19880124000006318, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817057 back2_i=4716101740380817057
i=4716101740380817058 orig=1.98801e+07 dec=(19880124000006326, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817059 back2_i=4716101740380817059

Alcaro · 2021-12-10T17:41:00Z

Your question doesn't involve character strings, but it does involve converting digits+exponent to double, which is the tricky part of string conversion. This correction doesn't affect the rest of my answer.

For those values, your issue is significand accuracy. 19880124000006158 is not representable as double; if you try, you get 19880124000006160.0, and 19880124000006160.0 / 1e+9 is indeed 19880124.00000616 and not 19880124.000006158.

rpopescu · 2021-12-10T17:54:34Z

Unfortunately, the part concerning division as an improvement of accuracy does not seem to be accurate.

While I get the point about accuracy, the question of achieving roundtrip hasn't been answered. Roundtrip is mentioned in the documentation, and there are policies regarding rounding for in either direction, yet I have been unable to understand how to convert back to a double in a way that is roundtrip.
That's the gist of the question here.

Alcaro · 2021-12-10T18:24:33Z

I may have been unclear about division; it will give better results for negative exponents, but worse results for positive exponents. For best results, you'll need a branch.

And you'll need a third branch, containing a slower algorithm, if the exponent is outside the range ±23, and/or if the significand is >= 2**53, such that one or both of the mul/div inputs is not exactly representable as double. Prepare for several hundred or thousand lines of numerically tricky code.

Alternatively, you can use someone else's digits to double function. If you're on Linux, I recommend glibc strtod; if on Windows, try MSVC std::from_chars. If you're on something else, David Gay's dtoa.c is quite popular.

Floating point math is complicated.

rpopescu · 2021-12-10T18:47:08Z

Looking at the strtod source code, I see that it uses a multi-precision number for the integer part, and multiplies that with the relevant power of ten; below the relevant bits of code and comments:

 /* Read the integer part as a multi-precision number to NUM.  */
startp = str_to_mpn (startp, int_no, num, &numsize, &exponent ...
// ...
/* We now multiply the gained number by the given power of ten.  */

jk-jeon · 2021-12-10T18:53:27Z

Thanks @rpopescu for your interest on Dragonbox. What @Alcaro is saying is correct.

The problem is that decimal.significand * pow(10.0, decimal.exponent) doesn't give you the correct result, because

decimal.significand may be too large to be accurately converted to double,
pow(10.0, decimal.exponent) may be too large to be accurately converted to double,
Multiplication of these two may introduce yet another error, and
You said you inverted this when you have a negative exponent, which is another source of error.

As @Alcaro suggests, you should use either a correct string-to-double parser or a big number library to compute the result with no loss of precision.

Or, you can manually check the results, for example by asking Wolfram Alpha to compute the correct results, like (1) typing something like binary(1988012400000122 * 10^-8), (2) obtain the binary significand and the binary exponent, and (3) compare them to your original number.

Or, you could just write 1988012400000122e-8 to your source code to let your compiler to do the conversion. If the compiler is not buggy, it will give you the correct double.

rpopescu · 2021-12-10T19:08:41Z

Right, I got it finally - as I noted above, the precision is lost sometimes (because the reasons you've enumerated), but this can be avoided by using e.g. gmplib, as strtod does.
Thank you everyone.
@jk-jeon perhaps a note of caution regarding achieving the roundtrip in the documentation would be useful, to keep others as naive as me from assuming the basic multiplication would work.

jk-jeon · 2021-12-10T19:09:35Z

See this for example:

https://godbolt.org/z/heoGqaGrT

jk-jeon · 2021-12-10T19:10:11Z

@jk-jeon perhaps a note of caution regarding achieving the roundtrip in the documentation would be useful, to keep other from assuming the basic multiplication would work.

Okay I'll consider adding it.

rpopescu · 2021-12-10T19:11:45Z

Thanks again.
Actually, one more question @jk-jeon - what is the source of error when inverting the result when the exponent is negative? From what I can see, this results in packed xor; below is the code for the function double f(double d) { return -d; }

.LCPI0_0:
        .quad   0x8000000000000000              # double -0
        .quad   0x8000000000000000              # double -0
f(double):                                  # @f(double)
        xorps   xmm0, xmmword ptr [rip + .LCPI0_0]
        ret

jk-jeon · 2021-12-10T19:48:01Z

I thought by "inverting" you mean taking the reciprocal. Did you mean inverting the sign?

rpopescu · 2021-12-10T19:52:08Z

Yes, that’s what I was doing, negating the value if is_negative, which from my understanding of the representation would simply flip a bit, right? Anyway thanks again for the clarification!

Alcaro · 2021-12-10T19:53:53Z

Yes, as you can see in that assembly snippet, negating a float or double is just a bit flip.

rpopescu · 2021-12-10T20:00:37Z

Yes, as I said it’s a bit flip and hence my confusion about the possible precision loss, but it’s been clarified.
Thanks for the heads up - replied by email and signature got added :-)

jk-jeon · 2022-01-12T14:15:11Z

Addressed in 1c7596b.

9c26179a Fix jk-jeon/dragonbox#27 02059bde Fix a typo 57d9b60c Fix a bug ba65f0ef Add missing copyright notice fcc6cc0e Improve comment 29aa8a23 Improve comment fa139ad9 Help MSVC to not generate horrible assembly 478edcc4 Fix ABI overhead issue d4f164bd Improve comment a84c11e1 A small optimization turning mul into lea 6bf01402 Add the old paper along with the new one 955d7387 Rerun Milo's benchmark 77aeee25 Refactoring + some improvement on 9-digits case b75d296c Optimize digit printing algorithm 8864215b Merge branch 'master' of https://github.com/jk-jeon/dragonbox dd52aa72 Workaround clang-format & correctly attribute jeaiii 71684f08 Update README.md 21400aab Rewrite paper 3da7d9d8 Change <= epsilon to < epsilon, as that's what we need 4b1f309d Update README.md fefd03b4 Rerun Milo's benchmark 5ca7e3d2 Change the order of condition evaluations 5ddb8bcf Add more const; reformat b4d5f705 Add test for compressed cache fb970736 Fix incorrect analysis; now it is correct. fe2e5c1e Rename beta_minus_1 to beta to reflect the new paper 2b673c0c Rerun benchmark 3f33d4dc Run MATLAB even when no actual benchmark has been done 4d82dfe4 Fix a bug that shaded regions are not correctly shown in the exported pdf e52fe78c Use better exportgraphics rather than print f1e59661 Merge branch 'master' of https://github.com/jk-jeon/dragonbox 2e6eefa5 Update README.md 0377186d Update README.md 38e11bc4 Rerun verify_cache_precision 07a450ed Ignore pdf files generated from plot_required_bits.m ef4ac121 Required bits plot MATLAB script d4f0f783 Reflect the new paper & add file output eff5914f Reflect the new paper bba1a8b7 Merge branch 'master' of https://github.com/jk-jeon/dragonbox a37c3c28 Add missing test cases. No problem found. 96dc15c8 Update README.md d41e062b Fix a bug 83d5df3e Reflect the new paper, simplify things 692f67f4 Implement jeaiii-style printing algorithm cd533cde Both binary32 and binary64 remove trailing zeros af08b920 Fix an error (has no effect in fact) 133f0aaf Match new paper f5ff2c4e Shall not feed 0 to dragonbox 2537b740 Further simplify check_divisibility_and_divide_by_pow10 4cece761 Even more simplified version of check_divisibility_and_divide_by_pow10 274d4634 Use newly implemented find_all_good_rational 990b2a4f Add file 028083c8 Rename 92001c41 Implement find_all_good_rational_approx_from_below_denoms 2ce3f894 Add some utilities to unsigned_rational 15e6a56d Add a little more utility functions 997a4fda Fix typos and improve some sentences deabce23 Fix typos, improve some sentences, correct some small errors 2ee87104 Improve comments 812646d3 Improve comments 8c484269 Include missed cases 72e22a33 Include possibily missed cases 66de8584 Add template keyword here and there 1c1ca394 More precise test dc41a327 Refine false positive condition 3e1083a2 Remove generate_compressed_cache_error_table 938a3ed6 Implement compressed cache verification 259787da Simplify compressed cache policy 569980d5 Justify why we don't need to add the error back. d7e8e307 Fix errors 23c2c20b Carefully re-done f75e50af Reformat 1f452fa8 Add is_even() 4a3402db Fix error 369cd0bc Improve comments and other conventions 30d9e4eb Simplify check_divisibility_and_divide_by_pow10. Apply the integer check idea from Schubfach. 4150375b Right-shift for signed integers is implementation-defined See fmtlib/fmt#2709 1c7596b1 Addressing jk-jeon/dragonbox#25 2e3f58c2 Adapt the change in dragonbox.h 8a1c13d7 Change header guards convention ec688b1a Remove now-unused stuffs b9f01c18 Reformat 1bfe1176 #if block for ease of testing 3c0f1c6f Eliminate integer checks dc4557df Reformat ac5788da Adapt new table generation scheme 6251ad9a Reformatting 346f7538 Use new bignum aa5c699a Regenerate cache c0776eec Add dragonbox::common as a dependency to sandbox bb47e99e Rewrite to match the new method daa0c81e Fix some trivial errors a29a8efb Modify copyright 9b11d788 Rename a file a5316aa0 Rename bc3d0eaf Modify according to the new scheme 74a58cb8 Remove old files b4a66924 Big int & continued fractions implementation 6281dc3e Add dragonbox::common to dependency 9868af72 Fix typo in comment 0cdb1548 Add dragonbox_to_chars as a dependency to sandbox 9b460165 Merge pull request #24 from jwakely/patch-1 4f3715d6 Fix typos and grammar in README cf8b12cc Rerun milo's benchmark 1a964604 Rename main.cpp into benchmark.cpp ab23f7aa Simplify remove_trailing_zeros further bf8442be Rename main.cpp into benchmark.cpp ba521e30 It seems the new version of readtable now reads hex integers directly 5ecd1534 Fix copyright notice 106f7b94 Replace stof/d into strtof/d 12310e74 Remove unused param name 11aae947 Improve comments on ROR 99fd14ba Fix the bug introduced by the commit e26c8d7363eb0ad9753b56a9699dea194784eb5e 648591a4 Fix a bug introduced by commit 2a33d79e1ce6f2045edacd79c8f087ffd10e1323 aad99d52 Silence MSVC warning by defining JKJ_DRAGONBOX_HAS_BUILTIN 39d32f1d Fix typo 2db8656d Fix typo e26c8d73 Optimize/simplify remove_trailing_zeros 2a33d79e Optimize break_rounding_tie Now the policy classes do not actually round, rather, it returns a boolean flag for if they may want to round d9b2664d Optimize check_divisibility_and_divide_by_pow10 5a699783 Reduced usages of __int128 __int128 seems to generate worse code 2ac7fa6a Add minmax Euclid to the paper db39c05c Fix typo f3764d5a Change divtable from SOA-style into AOS-style Because it makes more sense 63d9dc1b Update paper according to jk-jeon/dragonbox#22 8323f379 Make README.md clearer a54232b9 Merge pull request #22 from yotann/master fc909b8d fix bugs in compute_right_closed_directed 16473a7d Merge branch 'master' of https://github.com/jk-jeon/dragonbox 538d7d32 Revised some writings 90face27 Merge pull request #21 from Alexhuszagh/master e9509370 Add CI support for OSes and other architectures. git-subtree-dir: third_party/dragonbox git-subtree-split: 9c26179a9c4368db9f7703879e3b2e34aa0e29c2

rpopescu closed this as completed Dec 10, 2021

jk-jeon changed the title ~~Decimal to double~~ Consider adding a section on README elaborating on what roundtrip precisely means Dec 11, 2021

jk-jeon reopened this Dec 11, 2021

jk-jeon added a commit that referenced this issue Jan 12, 2022

Addressing #25

1c7596b

jk-jeon closed this as completed Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding a section on README elaborating on what roundtrip precisely means #25

Consider adding a section on README elaborating on what roundtrip precisely means #25

rpopescu commented Dec 10, 2021 •

edited

Loading

Alcaro commented Dec 10, 2021

rpopescu commented Dec 10, 2021 •

edited

Loading

Alcaro commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021

Alcaro commented Dec 10, 2021

rpopescu commented Dec 10, 2021

jk-jeon commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021 •

edited

Loading

jk-jeon commented Dec 10, 2021

jk-jeon commented Dec 10, 2021

rpopescu commented Dec 10, 2021 •

edited

Loading

jk-jeon commented Dec 10, 2021

rpopescu commented Dec 10, 2021 via email •

edited

Loading

Alcaro commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021

jk-jeon commented Jan 12, 2022

Consider adding a section on README elaborating on what roundtrip precisely means #25

Consider adding a section on README elaborating on what roundtrip precisely means #25

Comments

rpopescu commented Dec 10, 2021 • edited Loading

Alcaro commented Dec 10, 2021

rpopescu commented Dec 10, 2021 • edited Loading

Alcaro commented Dec 10, 2021 • edited Loading

rpopescu commented Dec 10, 2021

Alcaro commented Dec 10, 2021

rpopescu commented Dec 10, 2021

jk-jeon commented Dec 10, 2021 • edited Loading

rpopescu commented Dec 10, 2021 • edited Loading

jk-jeon commented Dec 10, 2021

jk-jeon commented Dec 10, 2021

rpopescu commented Dec 10, 2021 • edited Loading

jk-jeon commented Dec 10, 2021

rpopescu commented Dec 10, 2021 via email • edited Loading

Alcaro commented Dec 10, 2021 • edited Loading

rpopescu commented Dec 10, 2021

jk-jeon commented Jan 12, 2022

rpopescu commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021 •

edited

Loading

Alcaro commented Dec 10, 2021 •

edited

Loading

jk-jeon commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021 •

edited

Loading

rpopescu commented Dec 10, 2021 via email •

edited

Loading

Alcaro commented Dec 10, 2021 •

edited

Loading