-
Couldn't load subscription status.
- Fork 5.2k
Description
Currently, some conversions to floating-point types are done in two-step approaches: First converting to a larger floating-point type, then downcast to the target type, both of which performs a rounding. This creates an issue of double-rounding: the rounding result of the first step can change the decision of second type.
For example, 0x01010001 has 25 significand digits. When rounding to bfloat16 precision (8 significant digits), the trailing bits is 0x10001 and ties to increment. When rounding with two-steps in int->float->bfloat16, it will first round into 0x01010000 in float precision (24 significant digits), the the trailing bits 0x10000 is a midpoint and ties to even, which is a decrement.
The problematic cases including:
Integer to Half
All integer to Half conversions are currently done in two steps. Note that double rounding error only applies to integers that can't be perfectly converted to float ((U)Int32 or larger). The reverse direction (Half->Int32) doesn't apply because Half can be perfectly converted into `float.
Decimal to binary floating points
All decimal to binary floating point are done in two steps with converting to double first. This is a long-standing behavior that may need breaking change announcement.
BigInteger to float/Half/BFloat16
All BigInteger to binary floating point are converting to double first. This is also long-standing behavior from .NET Framework.
Decimal32/64/128 to float/Half/BFloat16
Not currently implemented, but we should pay attention in the future.