[SPARK-33659][SS] Document the current behavior for DataStreamWriter.toTable API #30885

xuanyuanking · 2020-12-22T09:34:15Z

What changes were proposed in this pull request?

Follow up work for #30521, document the following behaviors in the API doc:

Figure out the effects when configurations are (provider/partitionBy) conflicting with the existing table.
Document the lack of functionality on creating a v2 table, and guide that the users should ensure a table is created in prior to avoid the behavior unintended/insufficient table is being created.

Why are the changes needed?

We didn't have full support for the V2 table created in the API now. (TODO SPARK-33638)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Document only.

xuanyuanking · 2020-12-22T09:35:31Z

cc @HeartSaVioR @HyukjinKwon @viirya @zsxwing

python/pyspark/sql/streaming.py

HyukjinKwon · 2020-12-22T10:28:15Z

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala

  def start(): StreamingQuery = startInternal(None)

  /**
+   * :: Experimental ::


Experimental -> Evolving?

Per the comment

spark/common/tags/src/main/java/org/apache/spark/annotation/Experimental.java

Lines 28 to 31 in cc23581

* NOTE: If there exists a Scaladoc comment that immediately precedes this annotation, the first

* line of the comment must be ":: Experimental ::" with no trailing blank line. This is because

* of the known issue that Scaladoc displays only either the annotation or the comment, whichever

* comes first.

:: Experimental :: is the tag for scaladoc.

Both annotations (experimental and evolving) provide different semantics, right? Adding the different tag would give more confusion as it's not clear whether this is experimental vs evolving. You're getting it from Experimental, not Evolving.

I'll reopen this.

Make sense, let me delete this experimental one, only keep evolving.

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala

SparkQA · 2020-12-22T13:43:01Z

Test build #133205 has finished for PR 30885 at commit fc9dd3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-12-22T17:48:36Z

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala

+   * :: Experimental ::
+   *
   * Starts the execution of the streaming query, which will continually output results to the given
   * table as new data arrives. A new table will be created if the table not exists. The returned


Here it documented a new table will be created if not existing, but later it also documents "Please create a table manually before the execution". It looks confusing, I think. Could we rephrase them together and give a more concrete description about table creation?

Maybe we have two small paragraphs for v1 and v2 table separately? E.g.

For v1 table, partitioning columns provided by `partitionBy` will be respected no matter the table exists or not. A new table will be created if the table not exists. For v2 table, `partitionBy` will be ignored if the table already exists. `partitionBy` will be respected only if the v2 table does not exist. Besides, the v2 table created by this API lacks some functionalities (e.g., customized properties, options, and serde info). If you need them, please create the v2 table manually before the execution to avoid creating a table with incomplete information.

+1 to @viirya suggestion. My request was to describe the impact of options (mostly partitionBy) for matrix of v1 vs v2 and existing vs non-existing.

Separating the case of v1 vs v2 is more important, so the suggestion looks better.

Thanks for the rephrase, done in c158775

SparkQA · 2020-12-23T14:26:44Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37905/

SparkQA · 2020-12-23T14:56:27Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37905/

SparkQA · 2020-12-23T17:56:25Z

Test build #133311 has finished for PR 30885 at commit c158775.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR

Looks OK except the scaladoc tag.

SparkQA · 2020-12-24T03:30:52Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37925/

HyukjinKwon · 2020-12-24T03:44:09Z

The last commit is just removing two line comment. Let me just merge this in.

HyukjinKwon · 2020-12-24T03:44:28Z

Merged to master and branch-3.1.

…toTable API ### What changes were proposed in this pull request? Follow up work for #30521, document the following behaviors in the API doc: - Figure out the effects when configurations are (provider/partitionBy) conflicting with the existing table. - Document the lack of functionality on creating a v2 table, and guide that the users should ensure a table is created in prior to avoid the behavior unintended/insufficient table is being created. ### Why are the changes needed? We didn't have full support for the V2 table created in the API now. (TODO SPARK-33638) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Document only. Closes #30885 from xuanyuanking/SPARK-33659. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 86c1cfc) Signed-off-by: HyukjinKwon <[email protected]>

HeartSaVioR · 2020-12-24T04:01:24Z

Late LGTM. Thanks for addressing this!

SparkQA · 2020-12-24T04:07:18Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37925/

SparkQA · 2020-12-24T06:50:05Z

Test build #133334 has finished for PR 30885 at commit eff4b9d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

add doc

fc9dd3a

github-actions bot added CORE PYTHON SQL STRUCTURED STREAMING labels Dec 22, 2020

HyukjinKwon reviewed Dec 22, 2020

View reviewed changes

python/pyspark/sql/streaming.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Dec 22, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala Outdated Show resolved Hide resolved

viirya reviewed Dec 22, 2020

View reviewed changes

address comments

c158775

viirya approved these changes Dec 23, 2020

View reviewed changes

HeartSaVioR reviewed Dec 23, 2020

View reviewed changes

delete experimental tag

eff4b9d

HyukjinKwon approved these changes Dec 24, 2020

View reviewed changes

HyukjinKwon closed this in 86c1cfc Dec 24, 2020

	* NOTE: If there exists a Scaladoc comment that immediately precedes this annotation, the first
	* line of the comment must be ":: Experimental ::" with no trailing blank line. This is because
	* of the known issue that Scaladoc displays only either the annotation or the comment, whichever
	* comes first.

[SPARK-33659][SS] Document the current behavior for DataStreamWriter.toTable API #30885

[SPARK-33659][SS] Document the current behavior for DataStreamWriter.toTable API #30885

Uh oh!

Conversation

xuanyuanking commented Dec 22, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xuanyuanking commented Dec 22, 2020

Uh oh!

Uh oh!

HyukjinKwon Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

xuanyuanking Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

xuanyuanking Dec 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Dec 22, 2020

Uh oh!

viirya Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

viirya Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

xuanyuanking Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 23, 2020

Uh oh!

SparkQA commented Dec 23, 2020

Uh oh!

SparkQA commented Dec 23, 2020

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 24, 2020

Uh oh!

HyukjinKwon commented Dec 24, 2020

Uh oh!

HyukjinKwon commented Dec 24, 2020

Uh oh!

HeartSaVioR commented Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Dec 24, 2020

Uh oh!

SparkQA commented Dec 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

viirya Dec 22, 2020 •

edited

Loading

HeartSaVioR commented Dec 24, 2020 •

edited

Loading