Skip to content

Correct rounding for integer/decimal->FP conversion #112474

@huoyaoyuan

Description

@huoyaoyuan

#98643 (comment)

Currently, some conversions to floating-point types are done in two-step approaches: First converting to a larger floating-point type, then downcast to the target type, both of which performs a rounding. This creates an issue of double-rounding: the rounding result of the first step can change the decision of second type.

For example, 0x01010001 has 25 significand digits. When rounding to bfloat16 precision (8 significant digits), the trailing bits is 0x10001 and ties to increment. When rounding with two-steps in int->float->bfloat16, it will first round into 0x01010000 in float precision (24 significant digits), the the trailing bits 0x10000 is a midpoint and ties to even, which is a decrement.

The problematic cases including:

Integer to Half

All integer to Half conversions are currently done in two steps. Note that double rounding error only applies to integers that can't be perfectly converted to float ((U)Int32 or larger). The reverse direction (Half->Int32) doesn't apply because Half can be perfectly converted into `float.

Decimal to binary floating points

All decimal to binary floating point are done in two steps with converting to double first. This is a long-standing behavior that may need breaking change announcement.

BigInteger to float/Half/BFloat16

All BigInteger to binary floating point are converting to double first. This is also long-standing behavior from .NET Framework.

Decimal32/64/128 to float/Half/BFloat16

Not currently implemented, but we should pay attention in the future.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions