Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Sep 16, 2020

What changes were proposed in this pull request?

This is a follow-up on #29565, and addresses a few issues in the last PR:

  • style issue pointed by this comment
  • skip optimization when fromExp is foldable (by this comment) as there could be more efficient rule to apply for this case.
  • pass timezone info to the generated cast on the literal value
  • a bunch of cleanups and test improvements

Originally I plan to handle this when implementing SPARK-32858 but now think it's better to isolate these changes from that.

Why are the changes needed?

To fix a few left over issues in the above PR.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added a test for the foldable case. Otherwise relying on existing tests.

// This means `value` is within range `(min, max)`. Optimize this by moving the cast to the
// literal side.
val lit = Cast(Literal(value), fromType)
val lit = Cast(Literal(value), fromType, tz)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also evaluate this and pass the value. It will be optimized away by ConstantFolding anyway.

@sunchao
Copy link
Member Author

sunchao commented Sep 16, 2020

cc @cloud-fan and @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Thank you for this follow-up and pinging me, @sunchao .

Cast(fromExp, toType: IntegralType, tz), Literal(value, literalType))
if canImplicitlyCast(fromExp, toType, literalType) =>
simplifyIntegralComparison(be, fromExp, toType, value)
simplifyIntegralComparison(be, fromExp, toType, value, tz)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks irrelevant as a follow-up.
Is there a comment on this? pass timezone info to the generated cast on the literal value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to think a valid test case for this in the scope of SPARK-24994 (Integral Types). Although Spark handles some time-related types with the integral types, I believe we need to revisit this on new PR which aims to support time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun . I discovered this in tests - without this some of the existing tests will fail because of timezone mismatch (although I think this shouldn't affect correctness given we only handle integral types?).

In the follow-up, the current plan is to only extend to numeric types (e.g., float/double/decimal). I don't know whether support for time-related types is important or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me some pointer for the failures? Or, can we have a reproducer? Usually, every code change had better have a test coverage. If there is a test case for this, it sounds reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oops just found out my comment was not sent out successfully)

This is because ResolveTimeZone will try to add timezone info to all expressions that don't have it during query analysis. However, since the Cast expr was generated at optimization phase, it will not have the timezone info. As result, PlanTest.comparePlans will fail because of mismatch. I can try to come up with a test if necessary.

On the other hand, I think instead of using Cast, we may just directly use the value, since the Cast will be optimized away by ConstantFolding anyways later. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this also works: just evaluate the cast and make a literal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll do that then. I'm not totally sure aboutCast.canonicalize - somehow it is not used in query analysis and optimization and the lazy val is not initialized before the plan comparison.

@SparkQA
Copy link

SparkQA commented Sep 17, 2020

Test build #128778 has finished for PR 29775 at commit 4291ec2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

pass timezone info to the generated cast on the literal value

I'd also like to understand this more. Cast.canonicalize will drop the timezone if it's not needed, so UnwrapCastInBinaryComparison shouldn't care about timezone.

@SparkQA
Copy link

SparkQA commented Sep 17, 2020

Test build #128799 has finished for PR 29775 at commit 3a0f4b0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2020

Test build #128802 has finished for PR 29775 at commit 3a0f4b0.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sunchao and all.
Merged to master.

@SparkQA
Copy link

SparkQA commented Sep 17, 2020

Test build #128823 has finished for PR 29775 at commit 3a0f4b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
### What changes were proposed in this pull request?

This is a follow-up on apache#29565, and addresses a few issues in the last PR:
- style issue pointed by [this comment](apache#29565 (comment))
- skip optimization when `fromExp` is foldable (by [this comment](apache#29565 (comment))) as there could be more efficient rule to apply for this case.
- pass timezone info to the generated cast on the literal value
- a bunch of cleanups and test improvements

Originally I plan to handle this when implementing [SPARK-32858](https://issues.apache.org/jira/browse/SPARK-32858) but now think it's better to isolate these changes from that.

### Why are the changes needed?

To fix a few left over issues in the above PR.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added a test for the foldable case. Otherwise relying on existing tests.

Closes apache#29775 from sunchao/SPARK-24994-followup.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants