Add decimal coercions for table/partition mismatch in Hive connector + refactor (v2)#9422
Add decimal coercions for table/partition mismatch in Hive connector + refactor (v2)#9422losipiuk wants to merge 10 commits intoprestodb:masterfrom
Conversation
0e6d77c to
58a8b96
Compare
|
@electrum @martint I have a question on this one. I noticed that code I wrote for In coercion code I used the same logic which we use for Presto's The difference in behaviour between coercion in Presto and in Hive is visible for large double values when we actually go out of double precision.
Actually what Hive does, seems somewhat more reasonable as Note that current implementation implies different behaviour for different data types in Presto(depending if So we have 3 options here:
It seems that either 2 or 3 is a way to go. I will yet have to see what standard says about how cast double->decimal should behave. Any opinions on this one? |
Looking at the standard it seems that we shouldn't round any significant digits, but otherwise we are free to round or truncate: Both presto and Hive seems (?) to not conform to SQL standard. I would rather have Presto conform to the standard and maybe have Hive-compatible casts in Hive connector only. |
|
Yeah. But now comes the question how do you define "significant digits" for double value. Which of the digits in double Edit: Standard is not very clear on what those are: It says about number of significant digits in mantissa. But it does not say those are significant digits of double itself. Though it seems a reasonable assumption. |
That value is not a valid double, so that's the wrong question to ask :)
These would be the significant values (the mantissa): 12345678901234568. If we encounter such a double value (1.2345678901234568E+24), the right conversion to DECIMAL(30, 5) would be: 1234567890123456800000000.00000. Regarding the original question:
... since that's not a valid double value, I'm not sure when that case would ever occur. |
There was a problem hiding this comment.
What's the purpose of this commit?
The reason we have such an interface is to be able to override the coercion policy in our internal connector that's based on the Hive connector.
There was a problem hiding this comment.
Oh. I did not know about the other implementation. I think (not sure exactly as it was long time ago ;) ) that I assumed that an interface is not needed for internal class with just a single implementation. I will revert this one.
4360bf1 to
c62860d
Compare
c62860d to
fa3c74d
Compare
Good point @martint . I mean it is valid double literal in Presto. But internally it will represent double cc: @pnowojski |
|
logged #9575 |
This addresses:
#7658
This supersedes #7825.
Coercions for cases when Hive table schema does not match partition schema have been added for:
different decimals
decimal vs double
decimal vs float
This PR includes refactor of Hive coercions mechanism.