Skip to content

Conversation

@xuanyuanking
Copy link
Member

What changes were proposed in this pull request?

Follow up work for #30521, document the following behaviors in the API doc:

  • Figure out the effects when configurations are (provider/partitionBy) conflicting with the existing table.
  • Document the lack of functionality on creating a v2 table, and guide that the users should ensure a table is created in prior to avoid the behavior unintended/insufficient table is being created.

Why are the changes needed?

We didn't have full support for the V2 table created in the API now. (TODO SPARK-33638)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Document only.

@xuanyuanking
Copy link
Member Author

def start(): StreamingQuery = startInternal(None)

/**
* :: Experimental ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental -> Evolving?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment

* NOTE: If there exists a Scaladoc comment that immediately precedes this annotation, the first
* line of the comment must be ":: Experimental ::" with no trailing blank line. This is because
* of the known issue that Scaladoc displays only either the annotation or the comment, whichever
* comes first.
:: Experimental :: is the tag for scaladoc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both annotations (experimental and evolving) provide different semantics, right? Adding the different tag would give more confusion as it's not clear whether this is experimental vs evolving. You're getting it from Experimental, not Evolving.

I'll reopen this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, let me delete this experimental one, only keep evolving.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133205 has finished for PR 30885 at commit fc9dd3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* :: Experimental ::
*
* Starts the execution of the streaming query, which will continually output results to the given
* table as new data arrives. A new table will be created if the table not exists. The returned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it documented a new table will be created if not existing, but later it also documents "Please create a table manually before the execution". It looks confusing, I think. Could we rephrase them together and give a more concrete description about table creation?

Copy link
Member

@viirya viirya Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we have two small paragraphs for v1 and v2 table separately? E.g.

For v1 table, partitioning columns provided by `partitionBy` will be respected
no matter the table exists or not. A new table will be created if the table not exists.

For v2 table, `partitionBy` will be ignored if the table already exists. `partitionBy`
will be respected only if the v2 table does not exist. Besides, the v2 table created
by this API lacks some functionalities (e.g., customized properties, options, and serde info).
If you need them, please create the v2 table manually before the execution to avoid
creating a table with incomplete information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @viirya suggestion. My request was to describe the impact of options (mostly partitionBy) for matrix of v1 vs v2 and existing vs non-existing.

Separating the case of v1 vs v2 is more important, so the suggestion looks better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rephrase, done in c158775

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37905/

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37905/

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Test build #133311 has finished for PR 30885 at commit c158775.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK except the scaladoc tag.

@SparkQA
Copy link

SparkQA commented Dec 24, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37925/

@HyukjinKwon
Copy link
Member

The last commit is just removing two line comment. Let me just merge this in.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.1.

HyukjinKwon pushed a commit that referenced this pull request Dec 24, 2020
…toTable API

### What changes were proposed in this pull request?
Follow up work for #30521, document the following behaviors in the API doc:

- Figure out the effects when configurations are (provider/partitionBy) conflicting with the existing table.
- Document the lack of functionality on creating a v2 table, and guide that the users should ensure a table is created in prior to avoid the behavior unintended/insufficient table is being created.

### Why are the changes needed?
We didn't have full support for the V2 table created in the API now. (TODO SPARK-33638)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Document only.

Closes #30885 from xuanyuanking/SPARK-33659.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 86c1cfc)
Signed-off-by: HyukjinKwon <[email protected]>
@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Dec 24, 2020

Late LGTM. Thanks for addressing this!

@SparkQA
Copy link

SparkQA commented Dec 24, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37925/

@SparkQA
Copy link

SparkQA commented Dec 24, 2020

Test build #133334 has finished for PR 30885 at commit eff4b9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants