[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales#14374
[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales#14374biglobster wants to merge 6 commits intoapache:masterfrom
map should create a decimal key or value from decimals with different precisions and scales#14374Conversation
…ing different inferred precessions and scales
JIRA_ID:SPARK-16735
Description:
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
…ng different inferred precessions and scales
JIRA_ID:SPARK-16735
Description:
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
…ing different inferred precessions and scales
JIRA_ID: SPARK-16735
Description: In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
|
Can one of the admins verify this patch? |
|
Hi, @biglobster . |
map should create a decimal key or value from decimals with different precisions and scales
map should create a decimal key or value from decimals with different precisions and scalesmap should create a decimal key or value from decimals with different precisions and scalesSPARK-16735
map should create a decimal key or value from decimals with different precisions and scalesSPARK-16735map should create a decimal key or value from decimals with different precisions and scales
JIRA_ID:no
Description:fix jira_id in the test("SPARK-16735: CreateMap with Decimals")
Test:no
|
@dongjoon-hyun thank you, and I have just update the title of this pull request with the jira_id
|
|
Do we have problem with other functions, e.g. array, struct, coalesce? |
|
@rxin, Please let me leave a comment because I noticed this problem before. For array, yes. IMHO, we might have to consider other compatible numeric types (and widening precision and scale) if we should treat the decimals with different precision and scale as the same types (but it might have to not lose the value or precision, e.g. from decimal to double) EDITED: FYI, for least and greatest, I opened this #14294; however, we are discussing the right behaviour in SPARK-16646. Other than them, it seems there is no case similar with this. |
|
@rxin I find a related jira[SPARK-16714] that fixed the problem of the array function. so make my report as a sub task of the jira[SPARK-16714] |
|
For the array issue of SPARK-16714, the PR was ready for review at #14353 . |
| case _ if elementType.isInstanceOf[DecimalType] => | ||
| var tighter: DataType = elementType | ||
| colType.foreach { child => | ||
| if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) { |
There was a problem hiding this comment.
isTighterThan is not associative - i think this would be a problem?
There was a problem hiding this comment.
@rxin I have checked this function, and it will not lost any precision or range ,it 's safe .
and in the checkDecimalType, we just check the datatype and do not change datatype.
(when keys or values contains integer type , it will pass. but still integer type)
so checkInputDataTypes will return result like it done before.
and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type.
can you give me some advise ? thank you :)
There was a problem hiding this comment.
What I was referring to was that isTighterThan was not associative, and i don't think you can just take the tightest one this way.
As an example:
a precision 10, scale 5
b precision 7, scale 1
in this case a is not tighter than b, but b would be chosen as the target data type, leading to lose of precision.
JIRA_ID:SPARK-16735 Description: I have checked this function, and it will not lost any precision or range ,it 's safe . and in the checkDecimalType, we just check the datatype and do not change datatype. (when keys or values contains integer type , it will pass. but still integer type) so checkInputDataTypes will return result like it done before. and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type. Test:done
|
@biglobster @dongjoon-hyun I created a patch here: #14389 |
…ide.md JIRA_ID:SPARK-16870 Description:efault value for spark.sql.broadcastTimeout is 300s. and this property do not show in any docs of spark. so add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned Test:done
What changes were proposed in this pull request?
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
Before
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
After
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
How was this patch tested?
Pass the run-tests with a new test case.