[SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API #38406

amaliujia · 2022-10-27T00:35:42Z

What changes were proposed in this pull request?

This PR migrates all existing proto tests to be DataFrame API based.

Why are the changes needed?

The goal for proto tests is to test the capability of representing DataFrames by the Connect proto. So comparing with DataFrame API is more accurate.
There are some Connect plan execution requiring SparkSession anyway. We can unify all tests into one suite by only using DataFrame API (e.g. We can merge SparkConnectDeduplicateSuite.scala into SparkConnectProtoSuite.scala.
This also enables the possibility that we can also test result (not only plan) in the future.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UT.

…me API

amaliujia · 2022-10-27T00:36:09Z

R: @cloud-fan

...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala

HyukjinKwon · 2022-10-27T01:03:34Z

...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala

+  private def analyzePlan(plan: LogicalPlan): LogicalPlan = {
+    val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan)
+    analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed)
+    EliminateSubqueryAliases(connectAnalyzed)


Why do we need to do this? Would be great to add a comment here to explain.

I added some comments to clarify this function's usage.

but why do we need to eliminate subquery alias?

hmmm this is what I borrowed from

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala

Line 513 in c50d865

def analyze: LogicalPlan = {

We are using this Catalyst DSL analyze call already before this refactoring.

Did we hit any issues in this test suite without doing it?

There is no issue after removing it. I pushed a commit to remove it anyway.

...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala

cloud-fan · 2022-10-27T05:20:47Z

connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala

      case proto.Relation.RelTypeCase.SQL => transformSql(rel.getSql)
      case proto.Relation.RelTypeCase.LOCAL_RELATION =>
-        transformLocalRelation(rel.getLocalRelation)
+        transformLocalRelation(rel.getLocalRelation, common)


what is common? Every logical plan has an optional alias?

This is legacy design that I believe it thinks only relations have the optional alias.

Every logical plan could have an optional alias, in that case I prefer to move that alias out of the common to have its own message. This is because by that we can differentiate

.xx() .xx().as("") // probably invalid but user can write down such API .xx().as("alias_1")

I can also change this in this PR if you think this is a right time.

I sent a PR for this topic (to avoid complicate current refactoring PR too much): #38415

AmplabJenkins · 2022-10-27T19:56:56Z

Can one of the admins verify this patch?

cloud-fan · 2022-10-28T05:15:04Z

thanks, merging to master!

…me API ### What changes were proposed in this pull request? This PR migrates all existing proto tests to be DataFrame API based. ### Why are the changes needed? 1. The goal for proto tests is to test the capability of representing DataFrames by the Connect proto. So comparing with DataFrame API is more accurate. 2. There are some Connect plan execution requiring SparkSession anyway. We can unify all tests into one suite by only using DataFrame API (e.g. We can merge `SparkConnectDeduplicateSuite.scala` into `SparkConnectProtoSuite.scala`. 3. This also enables the possibility that we can also test result (not only plan) in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT. Closes apache#38406 from amaliujia/refactor_server_tests. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

amaliujia added 3 commits October 26, 2022 14:05

[SPARK-40926][CONNECT] Refactor server side tests to only use DataFra…

0e20ec9

…me API

[SPARK-40926][CONNECT] Refactor server side tests to only use DataFra…

e0423dd

…me API

remove unnecessary change

e928f67

github-actions bot added CONNECT SQL labels Oct 27, 2022

amaliujia commented Oct 27, 2022

View reviewed changes

...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Oct 27, 2022

View reviewed changes

cloud-fan reviewed Oct 27, 2022

View reviewed changes

...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala Show resolved Hide resolved

amaliujia added 2 commits October 26, 2022 21:48

update

2359183

update

e09b85a

cloud-fan reviewed Oct 27, 2022

View reviewed changes

update

5e773d4

update

4564218

cloud-fan approved these changes Oct 28, 2022

View reviewed changes

cloud-fan closed this in d26e484 Oct 28, 2022

[SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API #38406

[SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API #38406

Uh oh!

Conversation

amaliujia commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

amaliujia commented Oct 27, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloud-fan Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Oct 27, 2022

Uh oh!

cloud-fan commented Oct 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amaliujia commented Oct 27, 2022 •

edited

Loading

cloud-fan Oct 28, 2022 •

edited

Loading

cloud-fan Oct 27, 2022 •

edited

Loading