-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23986][SQL] freshName can generate non-unique names #21080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #89398 has finished for PR 21080 at commit
|
|
Hmmm... why is this failing? The change LGTM. |
| s"${fullName}_$id" | ||
| } else { | ||
| freshNameIds += fullName -> 1 | ||
| fullName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the given name is something like name1_1, I think you can still produce non-unique variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest something like s"${fullName}_0" at L581. It also solves the failed tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am correct, does this still have a conflict between a_01 wth a_0$id where $id = 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't changed to s"${fullName}_$id"? So you will get a_0_1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still has a problem. for sequence a_1, a, a, we have duplicated name a_1.
We can solve this problem by always adding the postfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, doesn't it be
a_1 -> a_1_0
a -> a_0
a -> a_1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry for my misunderstanding. To change line 581 would work well with unlikely-used char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya my comment was for the current code without your change.
|
LGTM |
| } else { | ||
| s"${freshNamePrefix}_$name" | ||
| } | ||
| if (freshNameIds.contains(fullName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we don't need this if
val id = freshNameIds.getOrElse(fullName, 0)
val res = s"${fullName}_$id"
freshNameIds(fullName) = id + 1
res
|
Test build #89441 has finished for PR 21080 at commit
|
|
Test build #89448 has finished for PR 21080 at commit
|
|
retest this please |
|
LGTM |
|
Test build #89455 has finished for PR 21080 at commit
|
|
thanks, merging to master/2.3! |
## What changes were proposed in this pull request? We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`. The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too. ## How was this patch tested? added UT Author: Marco Gaido <[email protected]> Closes #21080 from mgaido91/SPARK-23986. (cherry picked from commit f39e82c) Signed-off-by: Wenchen Fan <[email protected]>
|
The code still generates method signatures like this in Spark v2.3.1 (and obviously fails because of private void agg_doConsume1(byte agg_expr_01, boolean agg_exprIsNull_01,
short agg_expr_11, boolean agg_exprIsNull_11,
short agg_expr_21, boolean agg_exprIsNull_21,
int agg_expr_31, boolean agg_exprIsNull_31,
int agg_expr_41, boolean agg_exprIsNull_41,
int agg_expr_51, boolean agg_exprIsNull_51,
UTF8String agg_expr_61, boolean agg_exprIsNull_61,
byte agg_expr_71, boolean agg_exprIsNull_71,
long agg_expr_81, boolean agg_exprIsNull_81,
double agg_expr_91, boolean agg_exprIsNull_91,
long agg_expr_101, boolean agg_exprIsNull_101,
double agg_expr_111, boolean agg_exprIsNull_111,
long agg_expr_121, boolean agg_exprIsNull_121,
int agg_expr_131, boolean agg_exprIsNull_131,
long agg_expr_141, boolean agg_exprIsNull_141,
int agg_expr_151, boolean agg_exprIsNull_151,
boolean agg_expr_161, boolean agg_exprIsNull_161,
long agg_expr_171,
byte agg_expr_18, boolean agg_exprIsNull_18,
boolean agg_expr_19, boolean agg_exprIsNull_19,
byte agg_expr_20, boolean agg_exprIsNull_20,
boolean agg_expr_21, boolean agg_exprIsNull_21,
short agg_expr_22, boolean agg_exprIsNull_22,
int agg_expr_23, boolean agg_exprIsNull_23) throws java.io.IOException { |
|
@dzanozin the code you reported clearly misses this patch. Since I do see the commit on v2.3.1, I am a bit puzzled about the issue you reported. Please provide a reproducer so we can check whether this really affects 2.3.1 or higher versions. |
|
Please ignore the error report, it was a mess in the machine configuration with some v2.3.0 debris. Works as expected after cleanup generating the following method signature: private void agg_doConsume_1(byte agg_expr_0_1, boolean agg_exprIsNull_0_1,
short agg_expr_1_1, boolean agg_exprIsNull_1_1,
...My apologies! |
…"Redefinition of parameter") Ref: SPARK-23986 cherry picked from commit f39e82c We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`. The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too. added UT Author: Marco Gaido <[email protected]> Closes apache#21080 from mgaido91/SPARK-23986. (cherry picked from commit f39e82c)
We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`. The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too. added UT Author: Marco Gaido <[email protected]> Closes apache#21080 from mgaido91/SPARK-23986. (cherry picked from commit f39e82c) Signed-off-by: Wenchen Fan <[email protected]> Ref: LIHADOOP-56518 (cherry picked from 564019b) RB=2368692 BUG=LIHADOOP-56518 G=spark-reviewers R=rhu,mshen A=mmuralid,yezhou,mshen
We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`. The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too. added UT Author: Marco Gaido <[email protected]> Closes apache#21080 from mgaido91/SPARK-23986. (cherry picked from commit f39e82c) Signed-off-by: Wenchen Fan <[email protected]> Ref: LIHADOOP-56518 (cherry picked from 564019b) RB=2368692 BUG=LIHADOOP-56518 G=spark-reviewers R=rhu,mshen A=mmuralid,yezhou,mshen
What changes were proposed in this pull request?
We are using
CodegenContext.freshNameto get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting namename1and an instance with starting namename11.The PR changes the way a new name is generated by
CodegenContext.freshNameso that we generate unique names in this scenario too.How was this patch tested?
added UT