Skip to content

Conversation

@mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

We are using CodegenContext.freshName to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name name1 and an instance with starting name name11.

The PR changes the way a new name is generated by CodegenContext.freshName so that we generate unique names in this scenario too.

How was this patch tested?

added UT

@mgaido91
Copy link
Contributor Author

cc @cloud-fan @kiszk @viirya

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89398 has finished for PR 21080 at commit 8623918.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Hmmm... why is this failing? The change LGTM.

s"${fullName}_$id"
} else {
freshNameIds += fullName -> 1
fullName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the given name is something like name1_1, I think you can still produce non-unique variable name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest something like s"${fullName}_0" at L581. It also solves the failed tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am correct, does this still have a conflict between a_01 wth a_0$id where $id = 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't changed to s"${fullName}_$id"? So you will get a_0_1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still has a problem. for sequence a_1, a, a, we have duplicated name a_1.

We can solve this problem by always adding the postfix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, doesn't it be

a_1 -> a_1_0
a -> a_0
a -> a_1

?

Copy link
Member

@kiszk kiszk Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry for my misunderstanding. To change line 581 would work well with unlikely-used char.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya my comment was for the current code without your change.

@viirya
Copy link
Member

viirya commented Apr 17, 2018

LGTM

} else {
s"${freshNamePrefix}_$name"
}
if (freshNameIds.contains(fullName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we don't need this if

val id = freshNameIds.getOrElse(fullName, 0)
val res = s"${fullName}_$id"
freshNameIds(fullName) = id + 1
res

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89441 has finished for PR 21080 at commit 58f72ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89448 has finished for PR 21080 at commit 0993827.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

retest this please

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89455 has finished for PR 21080 at commit 0993827.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Apr 17, 2018
## What changes were proposed in this pull request?

We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`.

The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too.

## How was this patch tested?

added UT

Author: Marco Gaido <[email protected]>

Closes #21080 from mgaido91/SPARK-23986.

(cherry picked from commit f39e82c)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in f39e82c Apr 17, 2018
@dzanozin
Copy link

dzanozin commented Sep 10, 2018

The code still generates method signatures like this in Spark v2.3.1 (and obviously fails because of agg_expr_21 duplication):

private void agg_doConsume1(byte agg_expr_01, boolean agg_exprIsNull_01,
                            short agg_expr_11, boolean agg_exprIsNull_11,
                            short agg_expr_21, boolean agg_exprIsNull_21,
                            int agg_expr_31, boolean agg_exprIsNull_31,
                            int agg_expr_41, boolean agg_exprIsNull_41,
                            int agg_expr_51, boolean agg_exprIsNull_51,
                            UTF8String agg_expr_61, boolean agg_exprIsNull_61,
                            byte agg_expr_71, boolean agg_exprIsNull_71,
                            long agg_expr_81, boolean agg_exprIsNull_81,
                            double agg_expr_91, boolean agg_exprIsNull_91,
                            long agg_expr_101, boolean agg_exprIsNull_101,
                            double agg_expr_111, boolean agg_exprIsNull_111,
                            long agg_expr_121, boolean agg_exprIsNull_121,
                            int agg_expr_131, boolean agg_exprIsNull_131,
                            long agg_expr_141, boolean agg_exprIsNull_141,
                            int agg_expr_151, boolean agg_exprIsNull_151,
                            boolean agg_expr_161, boolean agg_exprIsNull_161,
                            long agg_expr_171,
                            byte agg_expr_18, boolean agg_exprIsNull_18,
                            boolean agg_expr_19, boolean agg_exprIsNull_19,
                            byte agg_expr_20, boolean agg_exprIsNull_20,
                            boolean agg_expr_21, boolean agg_exprIsNull_21,
                            short agg_expr_22, boolean agg_exprIsNull_22,
                            int agg_expr_23, boolean agg_exprIsNull_23) throws java.io.IOException {

@mgaido91
Copy link
Contributor Author

@dzanozin the code you reported clearly misses this patch. Since I do see the commit on v2.3.1, I am a bit puzzled about the issue you reported. Please provide a reproducer so we can check whether this really affects 2.3.1 or higher versions.

@dzanozin
Copy link

dzanozin commented Sep 11, 2018

Please ignore the error report, it was a mess in the machine configuration with some v2.3.0 debris. Works as expected after cleanup generating the following method signature:

private void agg_doConsume_1(byte agg_expr_0_1, boolean agg_exprIsNull_0_1,
                             short agg_expr_1_1, boolean agg_exprIsNull_1_1,
...

My apologies!

otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…"Redefinition of parameter")

Ref: SPARK-23986

cherry picked from commit f39e82c

We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`.

The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too.

added UT

Author: Marco Gaido <[email protected]>

Closes apache#21080 from mgaido91/SPARK-23986.

(cherry picked from commit f39e82c)
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`.

The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too.

added UT

Author: Marco Gaido <[email protected]>

Closes apache#21080 from mgaido91/SPARK-23986.

(cherry picked from commit f39e82c)
Signed-off-by: Wenchen Fan <[email protected]>

Ref: LIHADOOP-56518
(cherry picked from 564019b)

RB=2368692
BUG=LIHADOOP-56518
G=spark-reviewers
R=rhu,mshen
A=mmuralid,yezhou,mshen
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
We are using `CodegenContext.freshName` to get a unique name for any new variable we are adding. Unfortunately, this method currently fails to create a unique name when we request more than one instance of variables with starting name `name1` and an instance with starting name `name11`.

The PR changes the way a new name is generated by `CodegenContext.freshName` so that we generate unique names in this scenario too.

added UT

Author: Marco Gaido <[email protected]>

Closes apache#21080 from mgaido91/SPARK-23986.

(cherry picked from commit f39e82c)
Signed-off-by: Wenchen Fan <[email protected]>

Ref: LIHADOOP-56518
(cherry picked from 564019b)

RB=2368692
BUG=LIHADOOP-56518
G=spark-reviewers
R=rhu,mshen
A=mmuralid,yezhou,mshen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants