Skip to content

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)#17606

Closed
dbtsai wants to merge 2 commits intoapache:masterfrom
dbtsai:fixNaNvl
Closed

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)#17606
dbtsai wants to merge 2 commits intoapache:masterfrom
dbtsai:fixNaNvl

Conversation

@dbtsai
Copy link
Copy Markdown
Member

@dbtsai dbtsai commented Apr 11, 2017

What changes were proposed in this pull request?

NaNvl(float value, null) will be converted into NaNvl(float value, Cast(null, DoubleType)) and finally NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType)).

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

NaNvl(l, Cast(r, DoubleType))
case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType =>
NaNvl(Cast(l, DoubleType), r)
case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have is, why NaNvl(FloatType, DoubleType) should be cast to NaNvl(DoubleType, DoubleType), but NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)?

They all change the input type from FloatType to DoubleType. Won't the first cast cause mismatching?

Copy link
Copy Markdown
Member Author

@dbtsai dbtsai Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR prevents casting from NaNvl(FloatType, NullType) to NaNvl(DoubleType, DoubleType) since we want to minimize the casting as much as possible. Also, if we want to replace NaN by null, we want to keep the output type the same as input type.

Whether NaNvl(FloatType, DoubleType) should be cast into NaNvl(DoubleType, DoubleType) is another story, and we should discuss it and fix it in another PR. I agree with you, we should downcast the replacement DoubleType into FloatType. And in my opinion, doing this implicit casting is error-prone, and we should do explicit casting by users instead.

@gatorsmile maybe you can chime in, and give the feedback whether we should cast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because FunctionArgumentConversion is executed before ImplicitTypeCasts. When there is no danger of loss of information, the cast can be implicit for better usability. We can add the extra configuration flag for users to stop implicit casting.

If we do not upcast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType), what is the output data type?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since NaNvl evaluates to right when left is NaN, I think right should always cast to left. I wonder what is the behavior of other engines?

@viirya
Copy link
Copy Markdown
Member

viirya commented Apr 11, 2017

LGTM, if the above question doesn't matter.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 11, 2017

Test build #75702 has finished for PR 17606 at commit fa5e1af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)))
ruleTest(TypeCoercion.FunctionArgumentConversion,
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)),
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. Literal.create(null, NullType) should be Cast(Literal.create(null, NullType), FloatType).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The test is fixed. :)

NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType)))
ruleTest(TypeCoercion.FunctionArgumentConversion,
NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, NullType)),
NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, DoubleType)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then this should be Cast(Literal.create(null, NullType), DoubleType), I think.

@dbtsai
Copy link
Copy Markdown
Member Author

dbtsai commented Apr 11, 2017

+cc @cloud-fan @gatorsmile @rxin

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 11, 2017

Test build #75711 has finished for PR 17606 at commit e0625f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Copy Markdown
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in 8ad63ee Apr 12, 2017
asfgit pushed a commit that referenced this pull request Apr 12, 2017
…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

`NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: DB Tsai <dbt@netflix.com>

Closes #17606 from dbtsai/fixNaNvl.

(cherry picked from commit 8ad63ee)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
asfgit pushed a commit that referenced this pull request Apr 12, 2017
… cast to N…

…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

This is a backport of #17606

`NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: DB Tsai <dbt@netflix.com>
Author: DB Tsai <dbtsai@dbtsai.com>

Closes #17618 from dbtsai/branch-2.0.
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

`NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: DB Tsai <dbt@netflix.com>

Closes apache#17606 from dbtsai/fixNaNvl.
@dbtsai dbtsai deleted the fixNaNvl branch November 11, 2019 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants