-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33087][SQL] DataFrameWriterV2 should delegate table resolution to the analyzer #29970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
| writeOptions: Map[String, String], | ||
| isByName: Boolean) extends V2WriteCommand | ||
| isByName: Boolean) extends V2WriteCommand { | ||
| override def withNewTable(t: NamedRelation): AppendData = copy(table = t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make the table be a child too and not add a special case for it in the Analyzer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there was a discussion about it before and we decided to not making table as a child for writing commands. The same applies to InsertIntoStatement.
I can't clearly recall what the reason was, probably because we need to strip SubqueryAlias for table in the writing commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one related discussion: https://github.com/apache/spark/pull/21305/files#r206746383
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I recalled. Scan nodes under writing command are special, as they are not for scan, and shouldn't apply things like filter pushdown.
|
Kubernetes integration test status success |
|
Test build #129524 has finished for PR 29970 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content in tests in DataFrameWriterV2Suite are getting similar - it'd be nice if we can deduplicate across tests, but that is optional.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
|
retest this, please |
|
Test build #130226 has finished for PR 29970 at commit
|
|
Looks like this needs rebasing with recent master. @cloud-fan |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130286 has finished for PR 29970 at commit
|
|
retest this, please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #130300 has finished for PR 29970 at commit
|
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
cc @dongjoon-hyun as well. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130730 has finished for PR 29970 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Looks good, thanks for fixing this. Please get tests passing before merging, of course.
|
retest this, please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
brkyvz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #130767 has finished for PR 29970 at commit
|
|
GA passed, merging to master, thanks for the reviews! |
What changes were proposed in this pull request?
This PR makes
DataFrameWriterV2to create query plans withUnresolvedRelationand leave the table resolution work to the analyzer.Why are the changes needed?
Table resolution work should be done by the analyzer. After this PR, the behavior is more consistent between different APIs (DataFrameWriter, DataFrameWriterV2 and SQL). See the next section for behavior changes.
Does this PR introduce any user-facing change?
Yes.
DataFrameWriterand SQL INSERT.How was this patch tested?
new tests