Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new formatting algorithm for small given precision #3269

Merged
merged 7 commits into from
Jan 14, 2023
Merged

Implement a new formatting algorithm for small given precision #3269

merged 7 commits into from
Jan 14, 2023

Conversation

jk-jeon
Copy link
Contributor

@jk-jeon jk-jeon commented Jan 12, 2023

Implement the algorithm discussed in #3262 and #2750.

  • Didn't touch the constexpr-route; it still goes into the Grisu branch. We can eliminate this branch (and other Grisu-related stuffs) completely by constexpr-ifying several more functions after this PR.
  • Moved several stuffs from format-inl.h into format.h to satisfy the compiler.
  • Appended several entries into the Dragonbox table, 15 into the full table, and 1 into the compressed table. Modified float_info<double>::max_k accordingly, and adjusted the corresponding test case from format-impl-test.
  • Added uint64_t version of countl_zero.
  • The implementation of the algorithm is in the added else branch of format_float.
  • Currently all tests seem to be passing, but I suggest to add some more test cases. I can do it in a separate PR if you think it's needed.

Contrary to what we have worried, the performance even seems a bit better than the original one. Here is a benchmark result I've done:

image

  • Red: after this PR.
  • Green: after this PR, with #define FMT_USE_FULL_CACHE_DRAGONBOX 1.
  • Blue: original implementation.

Probably the main reason of this boost is that now digits are generated in pair rather than individually, so the number of multiplications in digit generation is halved (or maybe less than that). To be fair, we could do the same thing for the original implementation and maybe that will give an even better performance, though the rounding logic then might become a bit more complicated.

Also, it's worth mentioning that the new algorithm just gives up doing anything for the 19-digits case while the original implementation occasionally succeeds with Grisu for this case. That's probably the reason why we have the regression for this case.

Copy link
Contributor

@vitaut vitaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

include/fmt/format-inl.h Show resolved Hide resolved
Comment on lines +502 to +505
int lz = 0;
constexpr uint64_t msb_mask = 1ull << (num_bits<uint64_t>() - 1);
for (; (n & msb_mask) == 0; n <<= 1) lz++;
return lz;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this logic into a separate function (e.g. countl_zero_fallback) and reusing here and in 32-bit overload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean having a template countl_zero_fallback that is called from both of the overloads?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, it can be done separately so I'll merge this as is.

@jk-jeon
Copy link
Contributor Author

jk-jeon commented Jan 14, 2023

By the way, I think in previous discussions I said we will be able to reliably generate 18 digits with this new algorithm, but that's not the case. The number of digits that can be reliably generated without Dragon4 fallback is 17, and for the 18 digits case it depends on whether the multiplication by 10^k gives us 18 or 19 digits number. We have to have a one-digit margin for the rounding, so if we initially got a 19-digits number then we can print 18 digits out of it, and if we got an 18-digit number then we can print 17 digits out of it.

Actually, since we check (has_more_segments) if there are further digits other than those contained in the number we got from said multiplication, so if it turns out that there is no further digit (i.e., if has_more_segments == false), then we can print the last digit we spared for the rounding. But I considered it more harm than good, as doing so will introduce several more branches. I think 17 digits should be enough for all the common use cases.

@vitaut vitaut merged commit 0f42c17 into fmtlib:master Jan 14, 2023
@vitaut
Copy link
Contributor

vitaut commented Jan 14, 2023

I think 17 digits should be enough for all the common use cases.

I agree. Precision of 17 can be used for roundtrip with g and e, larger values are pretty uncommon.

@vitaut
Copy link
Contributor

vitaut commented Jan 14, 2023

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants