-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup #29775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // This means `value` is within range `(min, max)`. Optimize this by moving the cast to the | ||
| // literal side. | ||
| val lit = Cast(Literal(value), fromType) | ||
| val lit = Cast(Literal(value), fromType, tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also evaluate this and pass the value. It will be optimized away by ConstantFolding anyway.
|
cc @cloud-fan and @dongjoon-hyun |
|
Thank you for this follow-up and pinging me, @sunchao . |
| Cast(fromExp, toType: IntegralType, tz), Literal(value, literalType)) | ||
| if canImplicitlyCast(fromExp, toType, literalType) => | ||
| simplifyIntegralComparison(be, fromExp, toType, value) | ||
| simplifyIntegralComparison(be, fromExp, toType, value, tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks irrelevant as a follow-up.
Is there a comment on this? pass timezone info to the generated cast on the literal value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to think a valid test case for this in the scope of SPARK-24994 (Integral Types). Although Spark handles some time-related types with the integral types, I believe we need to revisit this on new PR which aims to support time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dongjoon-hyun . I discovered this in tests - without this some of the existing tests will fail because of timezone mismatch (although I think this shouldn't affect correctness given we only handle integral types?).
In the follow-up, the current plan is to only extend to numeric types (e.g., float/double/decimal). I don't know whether support for time-related types is important or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give me some pointer for the failures? Or, can we have a reproducer? Usually, every code change had better have a test coverage. If there is a test case for this, it sounds reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oops just found out my comment was not sent out successfully)
This is because ResolveTimeZone will try to add timezone info to all expressions that don't have it during query analysis. However, since the Cast expr was generated at optimization phase, it will not have the timezone info. As result, PlanTest.comparePlans will fail because of mismatch. I can try to come up with a test if necessary.
On the other hand, I think instead of using Cast, we may just directly use the value, since the Cast will be optimized away by ConstantFolding anyways later. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea this also works: just evaluate the cast and make a literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll do that then. I'm not totally sure aboutCast.canonicalize - somehow it is not used in query analysis and optimization and the lazy val is not initialized before the plan comparison.
|
Test build #128778 has finished for PR 29775 at commit
|
I'd also like to understand this more. |
|
Test build #128799 has finished for PR 29775 at commit
|
|
retest this please |
|
Test build #128802 has finished for PR 29775 at commit
|
|
retest this please |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @sunchao and all.
Merged to master.
|
Test build #128823 has finished for PR 29775 at commit
|
### What changes were proposed in this pull request? This is a follow-up on apache#29565, and addresses a few issues in the last PR: - style issue pointed by [this comment](apache#29565 (comment)) - skip optimization when `fromExp` is foldable (by [this comment](apache#29565 (comment))) as there could be more efficient rule to apply for this case. - pass timezone info to the generated cast on the literal value - a bunch of cleanups and test improvements Originally I plan to handle this when implementing [SPARK-32858](https://issues.apache.org/jira/browse/SPARK-32858) but now think it's better to isolate these changes from that. ### Why are the changes needed? To fix a few left over issues in the above PR. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a test for the foldable case. Otherwise relying on existing tests. Closes apache#29775 from sunchao/SPARK-24994-followup. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This is a follow-up on #29565, and addresses a few issues in the last PR:
fromExpis foldable (by this comment) as there could be more efficient rule to apply for this case.Originally I plan to handle this when implementing SPARK-32858 but now think it's better to isolate these changes from that.
Why are the changes needed?
To fix a few left over issues in the above PR.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added a test for the foldable case. Otherwise relying on existing tests.