Skip to content

Conversation

@statementreply
Copy link
Contributor

  • Modify the behavior of _Assemble_floating_point_value_no_shift (renamed to avoid potential ODR issue) to gracefully handle the special cases below, and remove special case handling code from _Assemble_floating_point_value.

    • When the significand carries over to a higher bit after rounding up, we need to renormalize the significand and increase the exponent to keep the significand within the normalized range.
      (example: 0x1.fffffffffffff8p+0 rounds to 0x1.0000000000000p+1)
    • In some cases, the new exponent becomes greater than the maximum exponent of the floating point format, so the result overflows.
      (example: 0x1.fffffffffffff8p+1023 overflows to ∞)
    • In some cases, the mantissa of a subnormal value becomes normalized after rounding up, so the result becomes a normal value.
      (example: 0x0.fffffffffffff8p-1022 rounds to 0x1.0000000000000-p1022)
  • Optimize _Right_shift_with_rounding with the branchless rounding technique in hex to_chars.

When the significand carries over to a higher bit after rounding up, we
need to renormalize the significand and increase the exponent to keep
the significand within the normalized range.
(example: 0x1.fffffffffffff8p+0 rounds to 0x1.0000000000000p+1)

In some cases, the new exponent becomes greater than the maximum
exponent of the floating point format, so the result overflows.
(example: 0x1.fffffffffffff8p+1023 overflows to inf)

In some cases, the mantissa of a subnormal value becomes normalized
after rounding up, so the result becomes a normal value.
(example: 0x0.fffffffffffff8p-1022 rounds to 0x1.0000000000000-p1022)

This commit modifies the behavior of _Assemble_floating_point_value_t
(and renames it to _Assemble_floating_point_value_no_shift to avoid
potential ODR issues) to gracefully handle the cases above, and removes
special case handling code from _Assemble_floating_point_value.
Use the branchless rounding technique in hex to_chars with minor
modifications to handle input tail bits.
@statementreply statementreply requested a review from a team as a code owner August 21, 2020 16:36
@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 22, 2020
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think I understand the overall approach here, and the new assembly process makes more sense than the previous control flow. I'm marking this as Request Changes for the "tail bits" bug that I believe I found (plus testing). There's an additional question about space/time improvements for the rounding technique.

@StephanTLavavej StephanTLavavej removed their assignment Aug 29, 2020
@statementreply
Copy link
Contributor Author

statementreply commented Aug 29, 2020

Here are my benchmark results (Intel Core i5-8400, fixed CPU clock speed at 2.7 GHz, VS 2019 16.8 Preview 2, Clang/LLVM 11.0.0-rc1). The measured times are average nanoseconds per floating-point string.

Times with default dynamic CPU clock speed setting are around 0.7x the values below.

Scenario MSVC Baseline MSVC +Assemble MSVC +Rounding MSVC +Both LLVM Baseline LLVM +Assemble LLVM +Rounding LLVM +Both
x64 float hex 5 (exact) 49.0 49.4 48.7 49.2 51.9 51.7 51.5 51.4
x64 float hex 6 (rounding) 61.2 61.2 52.2 💚 53.1 55.5 56.4 54.7 55.1
x64 double hex 13 (exact) 64.6 62.3 64.4 ✔️ 62.4 66.8 66.9 66.7 66.9
x64 double hex 14 (rounding) 75.4 74.2 67.8 💚 66.7 70.9 71.9 70.3 70.9
x64 float plain shortest roundtrip 159.9 162.3 151.2 ✔️ 153.4 144.4 146.4 145.8 143.9
x64 double plain shortest roundtrip 247.2 244.9 237.3 ✔️ 232.8 222.4 225.6 224.5 224.0
x86 float hex 5 (exact) 62.0 61.1 62.0 61.1 61.3 60.4 61.5 60.3
x86 float hex 6 (rounding) 86.3 85.4 71.6 💚 71.0 70.8 71.8 68.2 68.8
x86 double hex 13 (exact) 80.6 83.6 80.7 ❌ 83.6 78.2 78.7 78.9 79.8
x86 double hex 14 (rounding) 106.7 106.9 91.2 💚 91.2 91.6 89.6 87.9 89.5
x86 float plain shortest roundtrip 205.2 204.5 186.5 💚 185.9 175.0 175.4 169.9 ✔️ 169.9
x86 double plain shortest roundtrip 311.0 316.4 293.2 ✔️ 297.3 267.5 271.2 265.7 266.2

Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed perf data - nice improvements for MSVC! I'll push a one-line change to use logical AND with bool. FYI @cbezault as you had previously approved.

@StephanTLavavej StephanTLavavej merged commit 974582f into microsoft:master Oct 3, 2020
@StephanTLavavej
Copy link
Member

Thanks again for improving <charconv> again! 🚀 😸

@statementreply statementreply deleted the simplify_assemble_float branch April 17, 2021 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants