Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Oct 27, 2022

What changes were proposed in this pull request?

This PR migrates all existing proto tests to be DataFrame API based.

Why are the changes needed?

  1. The goal for proto tests is to test the capability of representing DataFrames by the Connect proto. So comparing with DataFrame API is more accurate.
  2. There are some Connect plan execution requiring SparkSession anyway. We can unify all tests into one suite by only using DataFrame API (e.g. We can merge SparkConnectDeduplicateSuite.scala into SparkConnectProtoSuite.scala.
  3. This also enables the possibility that we can also test result (not only plan) in the future.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UT.

@amaliujia
Copy link
Contributor Author

R: @cloud-fan

private def analyzePlan(plan: LogicalPlan): LogicalPlan = {
val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan)
analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed)
EliminateSubqueryAliases(connectAnalyzed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this? Would be great to add a comment here to explain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to clarify this function's usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why do we need to eliminate subquery alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm this is what I borrowed from

We are using this Catalyst DSL analyze call already before this refactoring.

Copy link
Contributor

@cloud-fan cloud-fan Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we hit any issues in this test suite without doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue after removing it. I pushed a commit to remove it anyway.

case proto.Relation.RelTypeCase.SQL => transformSql(rel.getSql)
case proto.Relation.RelTypeCase.LOCAL_RELATION =>
transformLocalRelation(rel.getLocalRelation)
transformLocalRelation(rel.getLocalRelation, common)
Copy link
Contributor

@cloud-fan cloud-fan Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is common? Every logical plan has an optional alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is legacy design that I believe it thinks only relations have the optional alias.

Every logical plan could have an optional alias, in that case I prefer to move that alias out of the common to have its own message. This is because by that we can differentiate

.xx()
.xx().as("") // probably invalid but user can write down such API
.xx().as("alias_1")

I can also change this in this PR if you think this is a right time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent a PR for this topic (to avoid complicate current refactoring PR too much): #38415

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in d26e484 Oct 28, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…me API

### What changes were proposed in this pull request?

This PR migrates all existing proto tests to be DataFrame API based.

### Why are the changes needed?

1. The goal for proto tests is to test the capability of representing DataFrames by the Connect proto. So comparing with DataFrame API is more accurate.
2. There are some Connect plan execution requiring SparkSession anyway. We can unify all tests into one suite by only using DataFrame API (e.g. We can merge `SparkConnectDeduplicateSuite.scala` into `SparkConnectProtoSuite.scala`.
3. This also enables the possibility that we can also test result (not only plan) in the future.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT.

Closes apache#38406 from amaliujia/refactor_server_tests.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants