Implement a new formatting algorithm for small given precision #3269

jk-jeon · 2023-01-12T02:12:50Z

Implement the algorithm discussed in #3262 and #2750.

Didn't touch the constexpr-route; it still goes into the Grisu branch. We can eliminate this branch (and other Grisu-related stuffs) completely by constexpr-ifying several more functions after this PR.
Moved several stuffs from format-inl.h into format.h to satisfy the compiler.
Appended several entries into the Dragonbox table, 15 into the full table, and 1 into the compressed table. Modified float_info<double>::max_k accordingly, and adjusted the corresponding test case from format-impl-test.
Added uint64_t version of countl_zero.
The implementation of the algorithm is in the added else branch of format_float.
Currently all tests seem to be passing, but I suggest to add some more test cases. I can do it in a separate PR if you think it's needed.

Contrary to what we have worried, the performance even seems a bit better than the original one. Here is a benchmark result I've done:

Red: after this PR.
Green: after this PR, with #define FMT_USE_FULL_CACHE_DRAGONBOX 1.
Blue: original implementation.

Probably the main reason of this boost is that now digits are generated in pair rather than individually, so the number of multiplications in digit generation is halved (or maybe less than that). To be fair, we could do the same thing for the original implementation and maybe that will give an even better performance, though the rounding logic then might become a bit more complicated.

Also, it's worth mentioning that the new algorithm just gives up doing anything for the 19-digits case while the original implementation occasionally succeeds with Grisu for this case. That's probably the reason why we have the regression for this case.

…d in #3262 and #2750

vitaut

Thanks for the PR!

include/fmt/format-inl.h

vitaut · 2023-01-13T19:23:44Z

include/fmt/format.h

+  int lz = 0;
+  constexpr uint64_t msb_mask = 1ull << (num_bits<uint64_t>() - 1);
+  for (; (n & msb_mask) == 0; n <<= 1) lz++;
+  return lz;


I would suggest moving this logic into a separate function (e.g. countl_zero_fallback) and reusing here and in 32-bit overload.

You mean having a template countl_zero_fallback that is called from both of the overloads?

Anyway, it can be done separately so I'll merge this as is.

jk-jeon · 2023-01-14T03:39:05Z

By the way, I think in previous discussions I said we will be able to reliably generate 18 digits with this new algorithm, but that's not the case. The number of digits that can be reliably generated without Dragon4 fallback is 17, and for the 18 digits case it depends on whether the multiplication by 10^k gives us 18 or 19 digits number. We have to have a one-digit margin for the rounding, so if we initially got a 19-digits number then we can print 18 digits out of it, and if we got an 18-digit number then we can print 17 digits out of it.

Actually, since we check (has_more_segments) if there are further digits other than those contained in the number we got from said multiplication, so if it turns out that there is no further digit (i.e., if has_more_segments == false), then we can print the last digit we spared for the rounding. But I considered it more harm than good, as doing so will introduce several more branches. I think 17 digits should be enough for all the common use cases.

vitaut · 2023-01-14T19:36:57Z

I think 17 digits should be enough for all the common use cases.

I agree. Precision of 17 can be used for roundtrip with g and e, larger values are pretty uncommon.

vitaut · 2023-01-14T19:37:04Z

Merged, thanks!

jk-jeon added 7 commits January 11, 2023 17:29

Implement the formatting algorithm for small given precision discusse…

cc4a496

…d in #3262 and #2750

Fix signed to unsigned implicit conversion warning

ab9ef0e

Fix a warning abou unsafe mix of uint64_t and bool

ab34e7d

Fix linkage error

16d2ee6

static variables are not allowed in constexpr functions

43d9381

Fix linkage error and static variable in constexpr function

54eb941

Fix linkage error

aa8f769

vitaut reviewed Jan 13, 2023

View reviewed changes

vitaut merged commit 0f42c17 into fmtlib:master Jan 14, 2023

jk-jeon mentioned this pull request Jan 24, 2023

look into using Ryu instead of Errol3 for floating point printing ziglang/zig#1299

Closed

vitaut mentioned this pull request Sep 18, 2023

Unexpected output from intel fmt::fprint #3645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a new formatting algorithm for small given precision #3269

Implement a new formatting algorithm for small given precision #3269

jk-jeon commented Jan 12, 2023

vitaut left a comment

vitaut Jan 13, 2023

jk-jeon Jan 13, 2023

vitaut Jan 14, 2023

vitaut Jan 14, 2023

jk-jeon commented Jan 14, 2023 •

edited

Loading

vitaut commented Jan 14, 2023

vitaut commented Jan 14, 2023

Implement a new formatting algorithm for small given precision #3269

Implement a new formatting algorithm for small given precision #3269

Conversation

jk-jeon commented Jan 12, 2023

vitaut left a comment

Choose a reason for hiding this comment

vitaut Jan 13, 2023

Choose a reason for hiding this comment

jk-jeon Jan 13, 2023

Choose a reason for hiding this comment

vitaut Jan 14, 2023

Choose a reason for hiding this comment

vitaut Jan 14, 2023

Choose a reason for hiding this comment

jk-jeon commented Jan 14, 2023 • edited Loading

vitaut commented Jan 14, 2023

vitaut commented Jan 14, 2023

jk-jeon commented Jan 14, 2023 •

edited

Loading