[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)#17606
[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)#17606dbtsai wants to merge 2 commits intoapache:masterfrom
Conversation
| NaNvl(l, Cast(r, DoubleType)) | ||
| case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType => | ||
| NaNvl(Cast(l, DoubleType), r) | ||
| case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType)) |
There was a problem hiding this comment.
One question I have is, why NaNvl(FloatType, DoubleType) should be cast to NaNvl(DoubleType, DoubleType), but NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)?
They all change the input type from FloatType to DoubleType. Won't the first cast cause mismatching?
There was a problem hiding this comment.
Yeah, this PR prevents casting from NaNvl(FloatType, NullType) to NaNvl(DoubleType, DoubleType) since we want to minimize the casting as much as possible. Also, if we want to replace NaN by null, we want to keep the output type the same as input type.
Whether NaNvl(FloatType, DoubleType) should be cast into NaNvl(DoubleType, DoubleType) is another story, and we should discuss it and fix it in another PR. I agree with you, we should downcast the replacement DoubleType into FloatType. And in my opinion, doing this implicit casting is error-prone, and we should do explicit casting by users instead.
@gatorsmile maybe you can chime in, and give the feedback whether we should cast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType).
There was a problem hiding this comment.
Because FunctionArgumentConversion is executed before ImplicitTypeCasts. When there is no danger of loss of information, the cast can be implicit for better usability. We can add the extra configuration flag for users to stop implicit casting.
If we do not upcast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType), what is the output data type?
There was a problem hiding this comment.
Since NaNvl evaluates to right when left is NaN, I think right should always cast to left. I wonder what is the behavior of other engines?
|
LGTM, if the above question doesn't matter. |
|
Test build #75702 has finished for PR 17606 at commit
|
| NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType))) | ||
| ruleTest(TypeCoercion.FunctionArgumentConversion, | ||
| NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)), | ||
| NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType))) |
There was a problem hiding this comment.
oh. Literal.create(null, NullType) should be Cast(Literal.create(null, NullType), FloatType).
There was a problem hiding this comment.
Thanks. The test is fixed. :)
| NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType))) | ||
| ruleTest(TypeCoercion.FunctionArgumentConversion, | ||
| NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, NullType)), | ||
| NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, DoubleType))) |
There was a problem hiding this comment.
then this should be Cast(Literal.create(null, NullType), DoubleType), I think.
|
Test build #75711 has finished for PR 17606 at commit
|
|
LGTM, merging to master! |
…aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Closes #17606 from dbtsai/fixNaNvl. (cherry picked from commit 8ad63ee) Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
… cast to N… …aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? This is a backport of #17606 `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Author: DB Tsai <dbtsai@dbtsai.com> Closes #17618 from dbtsai/branch-2.0.
…aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Closes apache#17606 from dbtsai/fixNaNvl.
What changes were proposed in this pull request?
NaNvl(float value, null)will be converted intoNaNvl(float value, Cast(null, DoubleType))and finallyNaNvl(Cast(float value, DoubleType), Cast(null, DoubleType)).This will cause mismatching in the output type when the input type is float.
By adding extra rule in TypeCoercion can resolve this issue.
How was this patch tested?
unite tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.