-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-2509][SQL] Add optimization for Substring. #1428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QA tests have started for PR 1428. This patch merges cleanly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not doable with constant folding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it more, this is probably ok to leave it here since in constant folding we don't really special case for null literals. However, I looked into other rules in NullPropagation and a lot of them don't really belong there... (has nothing to do with your PR though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin, thank you for your comment.
The rule ConstantFolding can be applied only if the expression is foldable, i.e. ALL of the child expressions are foldable, and the NullPropagation covers the case there exists at least one null literal regardless of foldabilities of the others.
So I think the cases are needed.
BTW, Substring can be foldable, right?
|
QA results for PR 1428: |
|
Thanks. Merging this in master & branch-1.0. |
`Substring` including `null` literal cases could be added to `NullPropagation`. Author: Takuya UESHIN <[email protected]> Closes #1428 from ueshin/issues/SPARK-2509 and squashes the following commits: d9eb85f [Takuya UESHIN] Add Substring cases to NullPropagation. (cherry picked from commit 9b38b7c) Signed-off-by: Reynold Xin <[email protected]>
|
@rxin Ah, wait for a moment. |
|
I'll open another issue about foldability of |
|
I've already merged this. I think we should re-evaluate in the future whether we would want to push some stuff into expression foldability. I think that might also lead to higher planner performance (although that's probably less important). More importantly, it does feel more natural to me that this stuff goes into each expression, rather than having a rule for each NullPropagation. cc @marmbrus |
|
@rxin I agree that we need the better way for |
|
You are right that this rule does more than null propagation now. I'm not sure what a better name would be. Regarding moving null propagation into the expressions, you could do it... but what would it look like? You specify which of the children make the entire expression null if they are null? Seems like a lot of refactoring for little benefit... |
This is a follow-up of #1428. Author: Takuya UESHIN <[email protected]> Closes #1432 from ueshin/issues/SPARK-2518 and squashes the following commits: 37d1ace [Takuya UESHIN] Fix foldability of Substring expression.
This is a follow-up of #1428. Author: Takuya UESHIN <[email protected]> Closes #1432 from ueshin/issues/SPARK-2518 and squashes the following commits: 37d1ace [Takuya UESHIN] Fix foldability of Substring expression. (cherry picked from commit cc965ee) Signed-off-by: Reynold Xin <[email protected]>
|
It doesn't bring much benefit right now, but what we are doing here is creating patterns in NullPropagation to specify the semantics of each individual expression ... not very scalable in maintaining this code base. |
|
I'm not sure I agree with that. This is a pretty niche optimization not something fundamental about the expressions that is required for correct evaluation (and the first version of this code had a bunch of mistakes). Having all of these rules in one place made it easier for me to realize that when doing the review. |
|
Well with that argument we shouldn't have any of the constant folding logic in expressions themselves either since they don't affect correctness. This is a situation that it is not obvious where to put the stuff. Anyway I think it's ok for them to be here for now, since it is not too long yet. As the list gets longer, we can revisit. |
|
That is exactly the argument I made when the folding logic was added. :) I suggested that we add |
|
I would be supportive of changing it to match my original proposal... |
`Substring` including `null` literal cases could be added to `NullPropagation`. Author: Takuya UESHIN <[email protected]> Closes apache#1428 from ueshin/issues/SPARK-2509 and squashes the following commits: d9eb85f [Takuya UESHIN] Add Substring cases to NullPropagation.
This is a follow-up of apache#1428. Author: Takuya UESHIN <[email protected]> Closes apache#1432 from ueshin/issues/SPARK-2518 and squashes the following commits: 37d1ace [Takuya UESHIN] Fix foldability of Substring expression.
…nsitivity (apache#1428) ### What changes were proposed in this pull request? Previously alignment was checked by comparing the exact attribute references between Spark and the underlying table, which failed with case insensitive SQL configurations. To fix this we use the configuration's resolver to compare references. ### Why are the changes needed? This was breaking some migrations from Spark 3.1 where the alignment check was not present. A query which attempted to do a MergeInto with column names which matched in a case-insensitive way would fail to trigger our plan rewrite rules leading to an opaque MERGE INTO is temporarily not supported exception. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This patch is a cherry pick of the code fixed and released in Apache Iceberg. apache/iceberg#4848 The test for this specific case is in the Iceberg codebase and we will back-port this to ADT following the merge of the fix in Spark.
Substringincludingnullliteral cases could be added toNullPropagation.