Make double/real->decimal cast compatible with cast to varchar by losipiuk · Pull Request #9595 · prestodb/presto

losipiuk · 2017-12-20T23:49:35Z

Make double/real->decimal cast compatible with cast to varchar

Casting approximate numerics (DOUBLE and REAL) to DECIMAL used
encoding which exposed all digits from decimal representation of
values stored by floating point number (for digits before decimal point).

This resulted in somewhat unexpected results.

E.g.

cast (double '100000000000000000000000000000000' as decimal(38))
returned 100000000000000005366162204393472.

If you look at BigDecimal(double) constructor (which is discourage to be
used), it has the same behaviour.

After this change, the encoding of double will be compatible with
casting double to varchar, and also how it naturally is printed out in CLI.

The double '100000000000000000000000000000000' is printed as '1.0E32'
and cast to decimal(38) the value will match that.

fixes #9575

cc: @martint

findepi

LGTM, minor comments

findepi · 2017-12-21T09:24:47Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

cmt msg

Make double/real->decimal cast compatible with cast to varchar ...

s/discourage/&d
s/as s/as the s
s/change/&,
s/naturally prints out/is printed
s/out/as
s/when/and &

findepi · 2017-12-21T09:31:58Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

intScale(scale), HALF_UP

findepi · 2017-12-21T09:33:16Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

Is catch (ArithmeticException below still reachable?

Yeah - encodeScaledValue can throw it.

findepi · 2017-12-21T09:33:39Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

intScale(scale), HALF_UP

findepi · 2017-12-21T09:33:59Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

Is catch (ArithmeticException below still reachable?

sopel39

Are we sure that we are complying with standard after this change (especially truncating mantissa bits)?

Should we give up on optimized method such easily?

sopel39 · 2017-12-21T10:05:40Z

presto-main/src/test/java/com/facebook/presto/type/TestDecimalCasts.java

Is this a valid double value? Does it fit? You could try using hex floating point literals to represent test doubles: https://stackoverflow.com/questions/25712348/hexadecimal-floating-point-literals.

According to standard we want to make sure that Mantissa is not truncated right? What is the implicit precision of double?

After offline discussion. It would be nice to have round trip tests to ensure mantissa information is not truncated.

It is string which can be interpreted as a valid double (this one: 1234567890123456824475648). By valid hear I mean the value which is exactly representable by IEEE 754-1985 double precision.

sopel39 · 2017-12-21T10:06:43Z

presto-spi/src/main/java/com/facebook/presto/spi/type/UnscaledDecimal128Arithmetic.java

I am not sure we should give up such easily on optimized conversion. Is there a code we could reuse to implement conversion correctly? What does Hive use?

It does not use this implementation (as results are different). I don't know if their implementation uses intermediate String object.

sopel39 · 2017-12-21T10:46:55Z

@martint regarding your comment: #9422 (comment)

The 2003 standard says:

An approximate numeric value consists of a mantissa and an exponent.
The mantissa is a signed numeric value, and the exponent is a signed integer that specifies the
magnitude of the mantissa. An approximate numeric value has a precision. The precision is a
positive integer that specifies the number of significant binary digits in the mantissa.
The value of an approximate numeric value is the mantissa multiplied by a factor determined
by the exponent.
An <approximate numeric literal> ANL consists of an <exact numeric literal> (called the
<mantissa>), the letter 'E' or 'e', and a <signed integer> (called the <exponent>).
If M is the value of the <mantissa> and E is the value of the <exponent>, then M * 10E is
the apparent value of ANL. The actual value of ANL is approximately the apparent value of
ANL, according to implementation-defined rules.

...

Whenever an exact or approximate numeric value is assigned to an exact numeric value site,
an approximation of its value that preserves leading significant digits after rounding or truncating
is represented in the declared type of the target. The value is converted to have the precision
and scale of the target. The choice of whether to truncate or round is implementation-defined.

So it would appear that we shouldn't truncate significant binary digits of an approximate numeric mantissa. Assuming that implicitly double has infinite precision, then our previous implementation is actually conforming to the standard although it is less intuitive.

@losipiuk could you check what other databases do?

EDIT: it could be that both implementations are correct as they choose different textual representation of double, but both translate to the same double value.

losipiuk · 2017-12-21T15:13:57Z

@losipiuk could you check what other databases do?

I checked PSQL and it behaves as code from this PR:

postgres=# select 
100000000000000005366162204393472::float8 "just double",
100000000000000005366162204393472::float8::varchar "cast to varchar",
100000000000000005366162204393472::float8::decimal(38) "cast to decimal(38)",
100000000000000005366162204393472::float8::decimal(38)::varchar "cast to decimal(38) to varchar"

 just double | cast to varchar |        cast to decimal(38)        |  cast to decimal(38) to varchar
-------------+-----------------+-----------------------------------+-----------------------------------
       1e+32 | 1e+32           | 100000000000000000000000000000000 | 100000000000000000000000000000000

Oracle does some weird sh***

SQL> SELECT
TO_BINARY_DOUBLE('100000000000000005366162204393472') as "just double",
cast(TO_BINARY_DOUBLE('100000000000000005366162204393472') as varchar(200)) as "cast to varchar",
CAST(TO_BINARY_DOUBLE('100000000000000005366162204393472') as NUMBER(38)) as "cast to number(38)",
CAST(CAST(TO_BINARY_DOUBLE('100000000000000005366162204393472') as NUMBER(38)) as varchar(200)) as "cast to number(38) to varchar"

in SQLPLUS result is:

just double    cast to varchar             cast to number(38)       cast to number(38) to varchar
-----------    -----------------------     -------------------      -----------------------------------
1.0E+032       1.0000000000000001E+032     1.0000E+32               100000000000000010000000000000000

IN SQLDEVELOPER

just double                          cast to varchar             cast to number(38)                     cast to number(38) to varchar
-----------                          -----------------------     ---------------------------------      -----------------------------------
100000000000000000000000000000000    1.0000000000000001E+032     100000000000000010000000000000000      100000000000000010000000000000000

EDIT: it could be that both implementations are correct as they choose different textual representation of double, but both translate to the same double value.

I think so. I did not read it anywhere but my guess is that the double to decimal convertion Hive/PSQL uses (which is compatible with String.valueOf(double)) works in such way
that it builds decimal with as many as possible rightmost digits turned to 0, as long as built value still represents source double.

losipiuk · 2017-12-21T15:35:49Z

AC

...

sopel39 · 2017-12-22T10:45:47Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

add TODO for faster implementation

I don't think this be revisited unless benchmarks show we should do it. We could place such TODOs in quite a few places in the code, would that be useful?

I added them - they are no harm. Maybe it will motivate sb to do the benchmarks in the future.

sopel39 · 2017-12-22T10:46:24Z

presto-main/src/main/java/com/facebook/presto/type/DecimalCasts.java

ditto, add TODO

Casting approximate numerics (DOUBLE and REAL) to DECIMAL used encoding which exposed all digits from decimal representation of values stored by floating point number (for digits before decimal point). This resulted in somewhat unexpected results. E.g. cast (double '100000000000000000000000000000000' as decimal(38)) returned 100000000000000005366162204393472. If you look at BigDecimal(double) constructor (which is discourage to be used), it has the same behaviour. After this change, the encoding of double will be compatible with casting double to varchar, and also how it naturally is printed out in CLI. The double '100000000000000000000000000000000' is printed as '1.0E32' and cast to decimal(38) the value will match that.

facebook-github-bot added the CLA Signed label Dec 20, 2017

losipiuk requested a review from findepi December 20, 2017 23:49

findepi approved these changes Dec 21, 2017

View reviewed changes

sopel39 self-requested a review December 21, 2017 10:00

sopel39 previously requested changes Dec 21, 2017

View reviewed changes

losipiuk force-pushed the lo/double-to-decimal branch from e1b4771 to 3f174a8 Compare December 21, 2017 15:16

findepi approved these changes Dec 21, 2017

View reviewed changes

sopel39 approved these changes Dec 22, 2017

View reviewed changes

losipiuk force-pushed the lo/double-to-decimal branch from 23be055 to a3edf71 Compare December 22, 2017 21:22

losipiuk added 2 commits December 23, 2017 00:53

Remove unused doubleToLongDecimal method

7422703

losipiuk force-pushed the lo/double-to-decimal branch from a3edf71 to 7422703 Compare December 22, 2017 23:53

losipiuk merged commit 26f3b45 into prestodb:master Dec 23, 2017

Conversation

losipiuk commented Dec 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

losipiuk Dec 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 commented Dec 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

losipiuk commented Dec 21, 2017

Uh oh!

losipiuk commented Dec 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

losipiuk commented Dec 20, 2017 •

edited

Loading

sopel39 left a comment •

edited

Loading

losipiuk Dec 21, 2017 •

edited

Loading

sopel39 commented Dec 21, 2017 •

edited

Loading