Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Oct 24, 2017

What changes were proposed in this pull request?

This adds a joined row to sort-merge join codegen. That joined row is used to generate code for filter expressions, which may fall back to using the result row. Previously, the right side of the join was used, which is incorrect (the non-codegen implementations use a joined row).

How was this patch tested?

Current tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @rdblue .
Since this is a correctness issue, can we have a test case including the incorrect result you met?

@SparkQA
Copy link

SparkQA commented Oct 24, 2017

Test build #83023 has finished for PR 19568 at commit 4afb088.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor Author

rdblue commented Oct 25, 2017

@dongjoon-hyun, yes, I'm currently working on it. I just wanted to get the rest up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we generate Java program, is it necessary to declare JoinedRow type for $joinedRow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's causing the test failures. This is a typo from some restructuring I did to get this upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@kiszk
Copy link
Member

kiszk commented Oct 25, 2017

Could you please change title from SPARK-22345 to [SPARK-22345]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to do this when there is CodegenFallback in the condition expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The joined row should always be used for correctness. We don't know what code the expression will generate, so we should plan on always passing the correct input row. Setting left and right on a joined row is a cheap operation, so I'd rather do it correctly than rely on something brittle like isInstanceOf[CodegenFallback].

Copy link
Contributor Author

@rdblue rdblue Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ended up being a bit more complicated. There are two problems (in 2.0.0 and 2.1.1 at least). The first is what this fixes, which is that the INPUT_ROW in the codegen context points to the wrong row. This is fixed and now has a test that fails if you uncomment the line that sets INPUT_ROW.

The second problem is in the check for CodegenFallback fails to check whether the condition supports codegen in some plans. To get the test to fail, I had to add a projection to exercise the path where this happens. I'll add a second commit for this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second problem was fixed in this commit: 6b6dd68

I still think that the codegen problem should be fixed. Detecting CodgenFallback is imperfect, but will still generate code and run it. I think we should either remove codegen from CodegenFallback or add this fix to ensure that code works, even if we don't expect to run it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also leave a comment explaining why add a JoinedRow as INPUT_ROW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Code for the condition was generated to depend on the right row instead
of the joined row.
@rdblue rdblue force-pushed the SPARK-22345-fix-sort-merge-codegen branch from 4afb088 to 3431778 Compare October 25, 2017 19:59
@SparkQA
Copy link

SparkQA commented Oct 25, 2017

Test build #83056 has finished for PR 19568 at commit 3431778.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83090 has finished for PR 19568 at commit 146d791.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

)
)

testInnerJoin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still can pass without the changes in this PR. What is the purpose of this test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails in 2.1.1 and versions before 6b6dd68. I'm not sure how to exercise the code generated by CodegenFallback with that fix, but this test is valid for the 2.1.1 branch.

leftPlan, rightPlan)
EnsureRequirements(spark.sessionState.conf).apply(sortMergeJoin)
EnsureRequirements(spark.sessionState.conf)
.apply(ProjectExec(sortMergeJoin.output, sortMergeJoin))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2.1.1, an extra project causes WholeStageCodegenExec to not detect that the expression contains CodegenFallback. This is no longer the case. Like I said, there is no longer a good way to test what happens when CodegenFallback generates code. If there were, I'd use that here to test the case.

I guess I could add a testing case to WholeStageCodegenExec to make sure the code is generated correctly.

@DonnyZone
Copy link
Contributor

DonnyZone commented Oct 30, 2017

This PR is similar to the initial commit when I try to fix SPARK-21441 in #18656 (92dc106).
(1) The INPUT_ROW in SortMergeJoinExec's codegen context points to the wrong row. In general, it works well. However, this behavior potentially causes wrong result or even NPE in bindReference. I think it is necessary to fix it.
(2) The CollapseCodegenStages rule before 2.1.2 has an issue which may lead to code generation even the SortMergeJoinExec contains CodegenFallback expressions, when it has an umbrella SparkPlan (e.g., ProjectExec). Consequently, it triggers the above potential issue.

@DonnyZone
Copy link
Contributor

@rdblue Could you find any cases that can trigger the problem of wrong INPUT_ROW in SortMergeJoinExec after the fix (#18656) for CollapseCodegenStages rule? I tried to do it before.

@rdblue
Copy link
Contributor Author

rdblue commented Oct 30, 2017

@DonnyZone, I don't know of any cases that use codgen after the fix for CodegenFallback, but I think this is still a good idea.

If Spark is going to generate code, it should generate correct code. That means either we remove the codegen implementation from CodegenFallback, or we fix the row passed in by sort-merge join. I'd rather fix sort-merge join because we may want to change the behavior to codegen as much of the condition as possible later on.

@gatorsmile
Copy link
Member

gatorsmile commented Oct 30, 2017

@rdblue Yes, the current implementation implicitly assumes the rule CollapseCodegenStages excludes all the illegal cases. How about adding an assert in SortMergeJoinExec to do the check that the condition of SortMergeJoinExec does not have CodegenFallback expressions? Also write a code comment to explain CollapseCodegenStages guarantees the assumption?

@rdblue
Copy link
Contributor Author

rdblue commented Nov 28, 2017

@gatorsmile, I think it would be better to fix codegen than to prevent it from happening with an assertion. If CodegenFallback can produce fallback code, why not allow it to when necessary?

@rdblue rdblue closed this Nov 28, 2017
@rdblue rdblue reopened this Nov 28, 2017
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 15, 2020
@github-actions github-actions bot closed this Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants