-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a new formatting algorithm for small given precision #3269
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
int lz = 0; | ||
constexpr uint64_t msb_mask = 1ull << (num_bits<uint64_t>() - 1); | ||
for (; (n & msb_mask) == 0; n <<= 1) lz++; | ||
return lz; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving this logic into a separate function (e.g. countl_zero_fallback
) and reusing here and in 32-bit overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean having a template countl_zero_fallback
that is called from both of the overloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, it can be done separately so I'll merge this as is.
By the way, I think in previous discussions I said we will be able to reliably generate 18 digits with this new algorithm, but that's not the case. The number of digits that can be reliably generated without Dragon4 fallback is 17, and for the 18 digits case it depends on whether the multiplication by 10^k gives us 18 or 19 digits number. We have to have a one-digit margin for the rounding, so if we initially got a 19-digits number then we can print 18 digits out of it, and if we got an 18-digit number then we can print 17 digits out of it. Actually, since we check ( |
I agree. Precision of 17 can be used for roundtrip with |
Merged, thanks! |
Implement the algorithm discussed in #3262 and #2750.
constexpr
-route; it still goes into the Grisu branch. We can eliminate this branch (and other Grisu-related stuffs) completely byconstexpr
-ifying several more functions after this PR.format-inl.h
intoformat.h
to satisfy the compiler.float_info<double>::max_k
accordingly, and adjusted the corresponding test case fromformat-impl-test
.uint64_t
version ofcountl_zero
.else
branch offormat_float
.Contrary to what we have worried, the performance even seems a bit better than the original one. Here is a benchmark result I've done:
#define FMT_USE_FULL_CACHE_DRAGONBOX 1
.Probably the main reason of this boost is that now digits are generated in pair rather than individually, so the number of multiplications in digit generation is halved (or maybe less than that). To be fair, we could do the same thing for the original implementation and maybe that will give an even better performance, though the rounding logic then might become a bit more complicated.
Also, it's worth mentioning that the new algorithm just gives up doing anything for the 19-digits case while the original implementation occasionally succeeds with Grisu for this case. That's probably the reason why we have the regression for this case.