Skip to content

Make double/real->decimal cast compatible with cast to varchar#9595

Merged
losipiuk merged 2 commits intoprestodb:masterfrom
losipiuk:lo/double-to-decimal
Dec 23, 2017
Merged

Make double/real->decimal cast compatible with cast to varchar#9595
losipiuk merged 2 commits intoprestodb:masterfrom
losipiuk:lo/double-to-decimal

Conversation

@losipiuk
Copy link
Contributor

@losipiuk losipiuk commented Dec 20, 2017

Make double/real->decimal cast compatible with cast to varchar

Casting approximate numerics (DOUBLE and REAL) to DECIMAL used
encoding which exposed all digits from decimal representation of
values stored by floating point number (for digits before decimal point).

This resulted in somewhat unexpected results.

E.g.

cast (double '100000000000000000000000000000000' as decimal(38))
returned 100000000000000005366162204393472.

If you look at BigDecimal(double) constructor (which is discourage to be
used), it has the same behaviour.

After this change, the encoding of double will be compatible with
casting double to varchar, and also how it naturally is printed out in CLI.

The double '100000000000000000000000000000000' is printed as '1.0E32'
and cast to decimal(38) the value will match that.

fixes #9575

cc: @martint

Copy link
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg

Make double/real->decimal cast compatible with cast to varchar ...

s/discourage/&d
s/as s/as the s
s/change/&,
s/naturally prints out/is printed
s/out/as
s/when/and &

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intScale(scale), HALF_UP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is catch (ArithmeticException below still reachable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - encodeScaledValue can throw it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intScale(scale), HALF_UP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is catch (ArithmeticException below still reachable?

@sopel39 sopel39 self-requested a review December 21, 2017 10:00
Copy link
Contributor

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we are complying with standard after this change (especially truncating mantissa bits)?

Should we give up on optimized method such easily?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid double value? Does it fit? You could try using hex floating point literals to represent test doubles: https://stackoverflow.com/questions/25712348/hexadecimal-floating-point-literals.

According to standard we want to make sure that Mantissa is not truncated right? What is the implicit precision of double?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After offline discussion. It would be nice to have round trip tests to ensure mantissa information is not truncated.

Copy link
Contributor Author

@losipiuk losipiuk Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is string which can be interpreted as a valid double (this one: 1234567890123456824475648). By valid hear I mean the value which is exactly representable by IEEE 754-1985 double precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should give up such easily on optimized conversion. Is there a code we could reuse to implement conversion correctly? What does Hive use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not use this implementation (as results are different). I don't know if their implementation uses intermediate String object.

@sopel39
Copy link
Contributor

sopel39 commented Dec 21, 2017

@martint regarding your comment: #9422 (comment)

The 2003 standard says:

An approximate numeric value consists of a mantissa and an exponent.
The mantissa is a signed numeric value, and the exponent is a signed integer that specifies the
magnitude of the mantissa. An approximate numeric value has a precision. The precision is a
positive integer that specifies the number of significant binary digits in the mantissa.
The value of an approximate numeric value is the mantissa multiplied by a factor determined
by the exponent.
An <approximate numeric literal> ANL consists of an <exact numeric literal> (called the
<mantissa>), the letter 'E' or 'e', and a <signed integer> (called the <exponent>).
If M is the value of the <mantissa> and E is the value of the <exponent>, then M * 10E is
the apparent value of ANL. The actual value of ANL is approximately the apparent value of
ANL, according to implementation-defined rules.

...

Whenever an exact or approximate numeric value is assigned to an exact numeric value site,
an approximation of its value that preserves leading significant digits after rounding or truncating
is represented in the declared type of the target. The value is converted to have the precision
and scale of the target. The choice of whether to truncate or round is implementation-defined.

So it would appear that we shouldn't truncate significant binary digits of an approximate numeric mantissa. Assuming that implicitly double has infinite precision, then our previous implementation is actually conforming to the standard although it is less intuitive.

@losipiuk could you check what other databases do?

EDIT: it could be that both implementations are correct as they choose different textual representation of double, but both translate to the same double value.

@losipiuk
Copy link
Contributor Author

@losipiuk could you check what other databases do?

I checked PSQL and it behaves as code from this PR:

postgres=# select 
100000000000000005366162204393472::float8 "just double",
100000000000000005366162204393472::float8::varchar "cast to varchar",
100000000000000005366162204393472::float8::decimal(38) "cast to decimal(38)",
100000000000000005366162204393472::float8::decimal(38)::varchar "cast to decimal(38) to varchar"

 just double | cast to varchar |        cast to decimal(38)        |  cast to decimal(38) to varchar
-------------+-----------------+-----------------------------------+-----------------------------------
       1e+32 | 1e+32           | 100000000000000000000000000000000 | 100000000000000000000000000000000

Oracle does some weird sh***

SQL> SELECT
TO_BINARY_DOUBLE('100000000000000005366162204393472') as "just double",
cast(TO_BINARY_DOUBLE('100000000000000005366162204393472') as varchar(200)) as "cast to varchar",
CAST(TO_BINARY_DOUBLE('100000000000000005366162204393472') as NUMBER(38)) as "cast to number(38)",
CAST(CAST(TO_BINARY_DOUBLE('100000000000000005366162204393472') as NUMBER(38)) as varchar(200)) as "cast to number(38) to varchar"

in SQLPLUS result is:

just double    cast to varchar             cast to number(38)       cast to number(38) to varchar
-----------    -----------------------     -------------------      -----------------------------------
1.0E+032       1.0000000000000001E+032     1.0000E+32               100000000000000010000000000000000

IN SQLDEVELOPER

just double                          cast to varchar             cast to number(38)                     cast to number(38) to varchar
-----------                          -----------------------     ---------------------------------      -----------------------------------
100000000000000000000000000000000    1.0000000000000001E+032     100000000000000010000000000000000      100000000000000010000000000000000

EDIT: it could be that both implementations are correct as they choose different textual representation of double, but both translate to the same double value.

I think so. I did not read it anywhere but my guess is that the double to decimal convertion Hive/PSQL uses (which is compatible with String.valueOf(double)) works in such way
that it builds decimal with as many as possible rightmost digits turned to 0, as long as built value still represents source double.

@losipiuk losipiuk force-pushed the lo/double-to-decimal branch from e1b4771 to 3f174a8 Compare December 21, 2017 15:16
@losipiuk
Copy link
Contributor Author

AC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add TODO for faster implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this be revisited unless benchmarks show we should do it. We could place such TODOs in quite a few places in the code, would that be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them - they are no harm. Maybe it will motivate sb to do the benchmarks in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, add TODO

@losipiuk losipiuk force-pushed the lo/double-to-decimal branch from 23be055 to a3edf71 Compare December 22, 2017 21:22
Casting approximate numerics (DOUBLE and REAL) to DECIMAL used
encoding which exposed all digits from decimal representation of
values stored by floating point number (for digits before decimal point).

This resulted in somewhat unexpected results.

E.g.

cast (double '100000000000000000000000000000000' as decimal(38))
returned 100000000000000005366162204393472.

If you look at BigDecimal(double) constructor (which is discourage to be
used), it has the same behaviour.

After this change, the encoding of double will be compatible with
casting double to varchar, and also how it naturally is printed out in CLI.

The double '100000000000000000000000000000000' is printed as '1.0E32'
and cast to decimal(38) the value will match that.
@losipiuk losipiuk force-pushed the lo/double-to-decimal branch from a3edf71 to 7422703 Compare December 22, 2017 23:53
@losipiuk losipiuk merged commit 26f3b45 into prestodb:master Dec 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong behaviour of double to decimal cast

4 participants