Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This PR makes DataFrameWriterV2 to create query plans with UnresolvedRelation and leave the table resolution work to the analyzer.

Why are the changes needed?

Table resolution work should be done by the analyzer. After this PR, the behavior is more consistent between different APIs (DataFrameWriter, DataFrameWriterV2 and SQL). See the next section for behavior changes.

Does this PR introduce any user-facing change?

Yes.

  1. writes to a temp view of v2 relation: previously it fails with table not found exception, now it works if the v2 relation is writable. This is consistent with DataFrameWriter and SQL INSERT.
  2. writes to other temp views: previously it fails with table not found exception, now it fails with a more explicit error message, saying that writing to a temp view of non-v2-relation is not allowed.
  3. writes to a view: previously it fails with table not writable error, now it fails with a more explicit error message, saying that writing to a view is not allowed.
  4. writes to a v1 table: previously it fails with table not writable error, now it fails with a more explicit error message, saying that writing to a v1 table is not allowed. (We can allow it later, by falling back to v1 command)

How was this patch tested?

new tests

@cloud-fan
Copy link
Contributor Author

cc @HeartSaVioR @rdblue @brkyvz

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34129/

writeOptions: Map[String, String],
isByName: Boolean) extends V2WriteCommand
isByName: Boolean) extends V2WriteCommand {
override def withNewTable(t: NamedRelation): AppendData = copy(table = t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make the table be a child too and not add a special case for it in the Analyzer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there was a discussion about it before and we decided to not making table as a child for writing commands. The same applies to InsertIntoStatement.

I can't clearly recall what the reason was, probably because we need to strip SubqueryAlias for table in the writing commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@cloud-fan cloud-fan Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I recalled. Scan nodes under writing command are special, as they are not for scan, and shouldn't apply things like filter pushdown.

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34129/

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Test build #129524 has finished for PR 29970 at commit b64f4bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content in tests in DataFrameWriterV2Suite are getting similar - it'd be nice if we can deduplicate across tests, but that is optional.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Oct 24, 2020

Test build #130226 has finished for PR 29970 at commit b64f4bd.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Looks like this needs rebasing with recent master. @cloud-fan

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34887/

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34887/

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Test build #130286 has finished for PR 29970 at commit ee3587a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34901/

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34901/

@SparkQA
Copy link

SparkQA commented Oct 27, 2020

Test build #130300 has finished for PR 29970 at commit ee3587a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, but probably I'm not quite qualified to make a decision.

cc. @brkyvz @rdblue Appreciate if you have time to revisit the PR. Thanks in advance.

@cloud-fan
Copy link
Contributor Author

cc @dongjoon-hyun as well.

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35339/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35339/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Test build #130730 has finished for PR 29970 at commit be50b21.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Looks good, thanks for fixing this. Please get tests passing before merging, of course.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35376/

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35376/

Copy link
Contributor

@brkyvz brkyvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Test build #130767 has finished for PR 29970 at commit be50b21.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

GA passed, merging to master, thanks for the reviews!

@cloud-fan cloud-fan closed this in 98730b7 Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants