Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a section on README elaborating on what roundtrip precisely means #25

Closed
rpopescu opened this issue Dec 10, 2021 · 16 comments

Comments

@rpopescu
Copy link

rpopescu commented Dec 10, 2021

Hi @jk-jeon, and thank you for your amazing work on this!
This isn't exactly an issue, but rather a question or a request for clarification of the documentation.
The documentation says the algorithm is round-trip.
To check this my understand of this, I have written a simple loop whose index "i" iterates all uint64_t values and:

  • obtains a double "orig" by bit_cast-ing the index
  • creates decimal "dec" from "orig" via dragonbox;
  • obtains another double called "back" via decimal.significand * pow(10.0, decimal.exponent), and then inverting it if negative;
  • compares the two doubles, and prints an error message if they're different;
  • the error message also includes a bit_cast-ed uint64_t from the "back" double, called "back_i" in the output below.

Here's a sample of its output; formatting is done with an old version of fmtlib (5.3.0):

i=4716101740380815687 orig=1.98801e+07 dec=(1988012400000122, -8, +) back=1.98801e+07 back_i=4716101740380815688
i=4716101740380815688 orig=1.98801e+07 dec=(19880124000001222, -9, +) back=1.98801e+07 back_i=4716101740380815689
i=4716101740380815699 orig=1.98801e+07 dec=(19880124000001263, -9, +) back=1.98801e+07 back_i=4716101740380815700
i=4716101740380815700 orig=1.98801e+07 dec=(19880124000001267, -9, +) back=1.98801e+07 back_i=4716101740380815701
i=4716101740380815703 orig=1.98801e+07 dec=(19880124000001278, -9, +) back=1.98801e+07 back_i=4716101740380815704
i=4716101740380815713 orig=1.98801e+07 dec=(19880124000001315, -9, +) back=1.98801e+07 back_i=4716101740380815714

I haven't verified this programatically but I notice that the back integer representation is one off the original integer from which the double was created. Note that the exact code computing "back" from "dec" is:

double back = static_cast<double>(dec.significand) * pow(10.0, static_cast<double>(decimal.exponent));

My questions are:

  • what am I doing wrong, if anything, in converting from the decimal to a double? I have a feeling that the answer is "everything"...
  • would the rounding policy have anything to do with off-by-one observation above?

Thank you kindly in advance for your time.

@Alcaro
Copy link

Alcaro commented Dec 10, 2021

Converting double to shortest accurate string is a difficult task.

Converting string to double is just as difficult. 19880124000001263 is not exactly representable as double, and neither is 10^-9; 19880124000001263.0 * 1.0e-9 rounds to 19880124.000001267, not 19880124.000001263.

You'll get it a bit more accurate if you divide by 10^9 rather than multiply by 10^-9, but significand accuracy will still be a problem for some inputs.

If you're on Linux, glibc strtod/sscanf is perfectly accurate for every input; if on Windows, try MSVC std::from_chars.

@rpopescu
Copy link
Author

rpopescu commented Dec 10, 2021

Thanks for your reply @Alcaro .
Note that my question doesn't involve strings; instead it's about going from the decimal struct returned by jkj::dragonbox::to_decimal to a double.
I'll try the division.
(later edit):
I have, computing an additional double called back2 and printing it out (and its integer bit_cast value) if neither back nor back2 equal orig. The code is
double back2 = static_cast<double>(dec.significand) / pow(10.0, static_cast<double>(-dec.exponent));
It looks like this:

i=4716101740380817013 orig=1.98801e+07 dec=(19880124000006158, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817014 back2_i=4716101740380817014
i=4716101740380817028 orig=1.98801e+07 dec=(19880124000006214, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817029 back2_i=4716101740380817029
i=4716101740380817056 orig=1.98801e+07 dec=(19880124000006318, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817057 back2_i=4716101740380817057
i=4716101740380817058 orig=1.98801e+07 dec=(19880124000006326, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817059 back2_i=4716101740380817059

@Alcaro
Copy link

Alcaro commented Dec 10, 2021

Your question doesn't involve character strings, but it does involve converting digits+exponent to double, which is the tricky part of string conversion. This correction doesn't affect the rest of my answer.

For those values, your issue is significand accuracy. 19880124000006158 is not representable as double; if you try, you get 19880124000006160.0, and 19880124000006160.0 / 1e+9 is indeed 19880124.00000616 and not 19880124.000006158.

@rpopescu
Copy link
Author

Unfortunately, the part concerning division as an improvement of accuracy does not seem to be accurate.

While I get the point about accuracy, the question of achieving roundtrip hasn't been answered. Roundtrip is mentioned in the documentation, and there are policies regarding rounding for in either direction, yet I have been unable to understand how to convert back to a double in a way that is roundtrip.
That's the gist of the question here.

@Alcaro
Copy link

Alcaro commented Dec 10, 2021

I may have been unclear about division; it will give better results for negative exponents, but worse results for positive exponents. For best results, you'll need a branch.

And you'll need a third branch, containing a slower algorithm, if the exponent is outside the range ±23, and/or if the significand is >= 2**53, such that one or both of the mul/div inputs is not exactly representable as double. Prepare for several hundred or thousand lines of numerically tricky code.

Alternatively, you can use someone else's digits to double function. If you're on Linux, I recommend glibc strtod; if on Windows, try MSVC std::from_chars. If you're on something else, David Gay's dtoa.c is quite popular.

Floating point math is complicated.

@rpopescu
Copy link
Author

Looking at the strtod source code, I see that it uses a multi-precision number for the integer part, and multiplies that with the relevant power of ten; below the relevant bits of code and comments:

 /* Read the integer part as a multi-precision number to NUM.  */
startp = str_to_mpn (startp, int_no, num, &numsize, &exponent ...
// ...
/* We now multiply the gained number by the given power of ten.  */

@jk-jeon
Copy link
Owner

jk-jeon commented Dec 10, 2021

Thanks @rpopescu for your interest on Dragonbox. What @Alcaro is saying is correct.

The problem is that decimal.significand * pow(10.0, decimal.exponent) doesn't give you the correct result, because

  • decimal.significand may be too large to be accurately converted to double,
  • pow(10.0, decimal.exponent) may be too large to be accurately converted to double,
  • Multiplication of these two may introduce yet another error, and
  • You said you inverted this when you have a negative exponent, which is another source of error.

As @Alcaro suggests, you should use either a correct string-to-double parser or a big number library to compute the result with no loss of precision.

Or, you can manually check the results, for example by asking Wolfram Alpha to compute the correct results, like (1) typing something like binary(1988012400000122 * 10^-8), (2) obtain the binary significand and the binary exponent, and (3) compare them to your original number.

Or, you could just write 1988012400000122e-8 to your source code to let your compiler to do the conversion. If the compiler is not buggy, it will give you the correct double.

@rpopescu
Copy link
Author

rpopescu commented Dec 10, 2021

Right, I got it finally - as I noted above, the precision is lost sometimes (because the reasons you've enumerated), but this can be avoided by using e.g. gmplib, as strtod does.
Thank you everyone.
@jk-jeon perhaps a note of caution regarding achieving the roundtrip in the documentation would be useful, to keep others as naive as me from assuming the basic multiplication would work.

@jk-jeon
Copy link
Owner

jk-jeon commented Dec 10, 2021

See this for example:

https://godbolt.org/z/heoGqaGrT

@jk-jeon
Copy link
Owner

jk-jeon commented Dec 10, 2021

@jk-jeon perhaps a note of caution regarding achieving the roundtrip in the documentation would be useful, to keep other from assuming the basic multiplication would work.

Okay I'll consider adding it.

@rpopescu
Copy link
Author

rpopescu commented Dec 10, 2021

Thanks again.
Actually, one more question @jk-jeon - what is the source of error when inverting the result when the exponent is negative? From what I can see, this results in packed xor; below is the code for the function double f(double d) { return -d; }

.LCPI0_0:
        .quad   0x8000000000000000              # double -0
        .quad   0x8000000000000000              # double -0
f(double):                                  # @f(double)
        xorps   xmm0, xmmword ptr [rip + .LCPI0_0]
        ret

@jk-jeon
Copy link
Owner

jk-jeon commented Dec 10, 2021

I thought by "inverting" you mean taking the reciprocal. Did you mean inverting the sign?

@rpopescu
Copy link
Author

rpopescu commented Dec 10, 2021 via email

@Alcaro
Copy link

Alcaro commented Dec 10, 2021

Yes, as you can see in that assembly snippet, negating a float or double is just a bit flip.

@rpopescu
Copy link
Author

Yes, as I said it’s a bit flip and hence my confusion about the possible precision loss, but it’s been clarified.
Thanks for the heads up - replied by email and signature got added :-)

@jk-jeon jk-jeon changed the title Decimal to double Consider adding a section on README elaborating on what roundtrip precisely means Dec 11, 2021
@jk-jeon jk-jeon reopened this Dec 11, 2021
jk-jeon added a commit that referenced this issue Jan 12, 2022
@jk-jeon
Copy link
Owner

jk-jeon commented Jan 12, 2022

Addressed in 1c7596b.

@jk-jeon jk-jeon closed this as completed Jan 12, 2022
yotann added a commit to yotann/bcdb-private that referenced this issue Mar 9, 2022
9c26179a Fix jk-jeon/dragonbox#27
02059bde Fix a typo
57d9b60c Fix a bug
ba65f0ef Add missing copyright notice
fcc6cc0e Improve comment
29aa8a23 Improve comment
fa139ad9 Help MSVC to not generate horrible assembly
478edcc4 Fix ABI overhead issue
d4f164bd Improve comment
a84c11e1 A small optimization turning mul into lea
6bf01402 Add the old paper along with the new one
955d7387 Rerun Milo's benchmark
77aeee25 Refactoring + some improvement on 9-digits case
b75d296c Optimize digit printing algorithm
8864215b Merge branch 'master' of https://github.com/jk-jeon/dragonbox
dd52aa72 Workaround clang-format & correctly attribute jeaiii
71684f08 Update README.md
21400aab Rewrite paper
3da7d9d8 Change <= epsilon to < epsilon, as that's what we need
4b1f309d Update README.md
fefd03b4 Rerun Milo's benchmark
5ca7e3d2 Change the order of condition evaluations
5ddb8bcf Add more const; reformat
b4d5f705 Add test for compressed cache
fb970736 Fix incorrect analysis; now it is correct.
fe2e5c1e Rename beta_minus_1 to beta to reflect the  new paper
2b673c0c Rerun benchmark
3f33d4dc Run MATLAB even when no actual benchmark has been done
4d82dfe4 Fix a bug that shaded regions are not correctly shown in the exported pdf
e52fe78c Use better exportgraphics rather than print
f1e59661 Merge branch 'master' of https://github.com/jk-jeon/dragonbox
2e6eefa5 Update README.md
0377186d Update README.md
38e11bc4 Rerun verify_cache_precision
07a450ed Ignore pdf files generated from plot_required_bits.m
ef4ac121 Required bits plot MATLAB script
d4f0f783 Reflect the new paper & add file output
eff5914f Reflect the new paper
bba1a8b7 Merge branch 'master' of https://github.com/jk-jeon/dragonbox
a37c3c28 Add missing test cases. No problem found.
96dc15c8 Update README.md
d41e062b Fix a bug
83d5df3e Reflect the new paper, simplify things
692f67f4 Implement jeaiii-style printing algorithm
cd533cde Both binary32 and binary64 remove trailing zeros
af08b920 Fix an error (has no effect in fact)
133f0aaf Match new paper
f5ff2c4e Shall not feed 0 to dragonbox
2537b740 Further simplify check_divisibility_and_divide_by_pow10
4cece761 Even more simplified version of check_divisibility_and_divide_by_pow10
274d4634 Use newly implemented find_all_good_rational
990b2a4f Add file
028083c8 Rename
92001c41 Implement find_all_good_rational_approx_from_below_denoms
2ce3f894 Add some utilities to unsigned_rational
15e6a56d Add a little more utility functions
997a4fda Fix typos and improve some sentences
deabce23 Fix typos, improve some sentences, correct some small errors
2ee87104 Improve comments
812646d3 Improve comments
8c484269 Include missed cases
72e22a33 Include possibily missed cases
66de8584 Add template keyword here and there
1c1ca394 More precise test
dc41a327 Refine false positive condition
3e1083a2 Remove generate_compressed_cache_error_table
938a3ed6 Implement compressed cache verification
259787da Simplify compressed cache policy
569980d5 Justify why we don't need to add the error back.
d7e8e307 Fix errors
23c2c20b Carefully re-done
f75e50af Reformat
1f452fa8 Add is_even()
4a3402db Fix error
369cd0bc Improve comments and other conventions
30d9e4eb Simplify check_divisibility_and_divide_by_pow10. Apply the integer check idea from Schubfach.
4150375b Right-shift for signed integers is implementation-defined See fmtlib/fmt#2709
1c7596b1 Addressing jk-jeon/dragonbox#25
2e3f58c2 Adapt the change in dragonbox.h
8a1c13d7 Change header guards convention
ec688b1a Remove now-unused stuffs
b9f01c18 Reformat
1bfe1176 #if block for ease of testing
3c0f1c6f Eliminate integer checks
dc4557df Reformat
ac5788da Adapt new table generation scheme
6251ad9a Reformatting
346f7538 Use new bignum
aa5c699a Regenerate cache
c0776eec Add dragonbox::common as a dependency to sandbox
bb47e99e Rewrite to match the new method
daa0c81e Fix some trivial errors
a29a8efb Modify copyright
9b11d788 Rename a file
a5316aa0 Rename
bc3d0eaf Modify according to the new scheme
74a58cb8 Remove old files
b4a66924 Big int & continued fractions implementation
6281dc3e Add dragonbox::common to dependency
9868af72 Fix typo in comment
0cdb1548 Add dragonbox_to_chars as a dependency to sandbox
9b460165 Merge pull request #24 from jwakely/patch-1
4f3715d6 Fix typos and grammar in README
cf8b12cc Rerun milo's benchmark
1a964604 Rename main.cpp into benchmark.cpp
ab23f7aa Simplify remove_trailing_zeros further
bf8442be Rename main.cpp into benchmark.cpp
ba521e30 It seems the new version of readtable now reads hex integers directly
5ecd1534 Fix copyright notice
106f7b94 Replace stof/d into strtof/d
12310e74 Remove unused param name
11aae947 Improve comments on ROR
99fd14ba Fix the bug introduced by the commit e26c8d7363eb0ad9753b56a9699dea194784eb5e
648591a4 Fix a bug introduced by commit 2a33d79e1ce6f2045edacd79c8f087ffd10e1323
aad99d52 Silence MSVC warning by defining JKJ_DRAGONBOX_HAS_BUILTIN
39d32f1d Fix typo
2db8656d Fix typo
e26c8d73 Optimize/simplify remove_trailing_zeros
2a33d79e Optimize break_rounding_tie Now the policy classes do not actually round, rather, it returns a boolean flag for if they may want to round
d9b2664d Optimize check_divisibility_and_divide_by_pow10
5a699783 Reduced usages of __int128 __int128 seems to generate worse code
2ac7fa6a Add minmax Euclid to the paper
db39c05c Fix typo
f3764d5a Change divtable from SOA-style into AOS-style Because it makes more sense
63d9dc1b Update paper according to jk-jeon/dragonbox#22
8323f379 Make README.md clearer
a54232b9 Merge pull request #22 from yotann/master
fc909b8d fix bugs in compute_right_closed_directed
16473a7d Merge branch 'master' of https://github.com/jk-jeon/dragonbox
538d7d32 Revised some writings
90face27 Merge pull request #21 from Alexhuszagh/master
e9509370 Add CI support for OSes and other architectures.

git-subtree-dir: third_party/dragonbox
git-subtree-split: 9c26179a9c4368db9f7703879e3b2e34aa0e29c2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants