-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API #38406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @cloud-fan |
...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala
Outdated
Show resolved
Hide resolved
| private def analyzePlan(plan: LogicalPlan): LogicalPlan = { | ||
| val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan) | ||
| analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed) | ||
| EliminateSubqueryAliases(connectAnalyzed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this? Would be great to add a comment here to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to clarify this function's usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why do we need to eliminate subquery alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm this is what I borrowed from
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
Line 513 in c50d865
| def analyze: LogicalPlan = { |
We are using this Catalyst DSL analyze call already before this refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we hit any issues in this test suite without doing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no issue after removing it. I pushed a commit to remove it anyway.
...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala
Show resolved
Hide resolved
| case proto.Relation.RelTypeCase.SQL => transformSql(rel.getSql) | ||
| case proto.Relation.RelTypeCase.LOCAL_RELATION => | ||
| transformLocalRelation(rel.getLocalRelation) | ||
| transformLocalRelation(rel.getLocalRelation, common) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is common? Every logical plan has an optional alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is legacy design that I believe it thinks only relations have the optional alias.
Every logical plan could have an optional alias, in that case I prefer to move that alias out of the common to have its own message. This is because by that we can differentiate
.xx()
.xx().as("") // probably invalid but user can write down such API
.xx().as("alias_1")
I can also change this in this PR if you think this is a right time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent a PR for this topic (to avoid complicate current refactoring PR too much): #38415
|
Can one of the admins verify this patch? |
|
thanks, merging to master! |
…me API ### What changes were proposed in this pull request? This PR migrates all existing proto tests to be DataFrame API based. ### Why are the changes needed? 1. The goal for proto tests is to test the capability of representing DataFrames by the Connect proto. So comparing with DataFrame API is more accurate. 2. There are some Connect plan execution requiring SparkSession anyway. We can unify all tests into one suite by only using DataFrame API (e.g. We can merge `SparkConnectDeduplicateSuite.scala` into `SparkConnectProtoSuite.scala`. 3. This also enables the possibility that we can also test result (not only plan) in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT. Closes apache#38406 from amaliujia/refactor_server_tests. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR migrates all existing proto tests to be DataFrame API based.
Why are the changes needed?
SparkConnectDeduplicateSuite.scalaintoSparkConnectProtoSuite.scala.Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing UT.