[SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation by HyukjinKwon · Pull Request #18749 · apache/spark

HyukjinKwon · 2017-07-27T09:59:37Z

What changes were proposed in this pull request?

This PR proposes to separate extended into examples and arguments internally so that both can be separately documented and add since and note for additional information.

For since, it looks users sometimes get confused by, up to my knowledge, missing version information. For example, see https://www.mail-archive.com/user@spark.apache.org/msg64798.html

For few good examples to check the built documentation, please see both:
from_json - https://spark-test.github.io/sparksqldoc/#from_json
like - https://spark-test.github.io/sparksqldoc/#like

For DESCRIBE FUNCTION, note and since are added as below:

> DESCRIBE FUNCTION EXTENDED rlike;
...
Extended Usage:
    Arguments:
      ...

    Examples:
      ...

    Note:
      Use LIKE to match with simple string pattern

> DESCRIBE FUNCTION EXTENDED to_json;
...
    Examples:
      ...

    Since: 2.2.0

For the complete documentation, see https://spark-test.github.io/sparksqldoc/

How was this patch tested?

Manual tests and existing tests. Please see https://spark-test.github.io/sparksqldoc

Jenkins tests are needed to double check

HyukjinKwon · 2017-07-27T10:01:23Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala

I added since for both from_json and to_json as examples here. These were added into SQL since 2.2.0 - (https://issues.apache.org/jira/browse/SPARK-19637 and https://issues.apache.org/jira/browse/SPARK-19967)

HyukjinKwon · 2017-07-27T10:04:50Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java

I believe we are okay to change this -

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala

Line 22 in 425c4ad

* considered an internal API to Spark SQL and are subject to change between minor releases.

and I could not find this in documentation.

SparkQA · 2017-07-27T12:29:05Z

Test build #79999 has finished for PR 18749 at commit 0958c0b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-27T15:58:18Z

Test build #80002 has finished for PR 18749 at commit d9277bf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-27T16:05:06Z

Test build #80003 has finished for PR 18749 at commit cb2dfe7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-27T16:42:14Z

Test build #80004 has finished for PR 18749 at commit 54e6d82.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-07-27T17:56:38Z

cc @rxin, @srowen and @cloud-fan, I believe this one is ready for a review. Could you take a look when you have some time?

srowen · 2017-07-27T18:49:12Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java

There aren't many ExpressionInfo objects in memory right? adding more info to this bean doesn't have any meaningful performance implications, I presume. I suppose it's just breaking down the existing info further.

I also presume this is considered an internal API so it's OK to change the constructor. You could even retain the one constructor that is removed, just in case.

Yes, I think so for the former and sure will keep the original constructor.

srowen · 2017-07-27T18:55:41Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java

Agree, that's my only question, whether this change matters, because it's a developer API. You provide default implementations, though extended() gets removed. Hm. I am wondering if it's possible to keep extended() but, well, ignore it? it would at least be compatible even if it meant someone's implementation out there would have to update to provide information to ExpressionInfo correctly. That's not really a functional problem though.

Will give a try.

HyukjinKwon · 2017-07-28T04:46:33Z

Will address comments within few days. (I am reading docs just to get used to things around my updated status)

HyukjinKwon · 2017-07-29T15:40:43Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java

I manually tested source compatibility and I guess it also won't break binary compatibility - https://stackoverflow.com/questions/21197255/is-adding-a-method-to-a-java-annotation-safe-for-backward-compatibility

In the last commit bf48875, I tried to keep the original behaviour and compatibility (although I guess we are fine to break this as said in few comments above):

For ExpressionDescription and ExpressionInfo,

if extended is an empty string (the default value), it creates the extended made by arguments, examples, note and since and use this as the extended description.

if extended is a non-empty string, it ignores the new arguments, examples, note and since but use this as the extended description, assuming this was set manually.

SparkQA · 2017-07-29T18:40:29Z

Test build #80044 has finished for PR 18749 at commit f51ee74.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-29T18:40:44Z

Test build #80043 has finished for PR 18749 at commit bf48875.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-01T07:05:17Z

For source compatibility, in the commit, 389fc6e, I tried to move back to extended to show it passes the tests and keep the source compatibility.

For binary compatibility, I tested ExpressionDescription (which I guess we are worried of) in the way I described before - databricks/scala-style-guide#46.

With ExpressionDescription.java assuming this is Spark 2.2.0:

import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
@Retention(RetentionPolicy.RUNTIME)
public @interface ExpressionDescription {
    String usage() default "";
    String extended() default "";
}

With Main.scala assuming this is the user application compiled with Spark 2.2.0:

@ExpressionDescription(
  usage = "I am usage",
  extended = """
    I am extended
  """)
case class A()

object Main {
  def main(args: Array[String]): Unit = {
    val df = scala.reflect.classTag[A].runtimeClass.getAnnotation(classOf[ExpressionDescription]).
    println(df.usage())
    println(df.extended())
  }
}

javac ExpressionDescription.java
scalac Main.scala
scala Main

prints

I am usage

    I am extended

And then, I changed ExpressionDescription.java assuming we merged this change into Spark's next release:

 import java.lang.annotation.Retention;
 import java.lang.annotation.RetentionPolicy;
 @Retention(RetentionPolicy.RUNTIME)
 public @interface ExpressionDescription {
     String usage() default "";
     String extended() default "";
+    String arguments() default "";
+    String examples() default "";
+    String note() default "";
+    String since() default "";
 }

and then I ran

javac ExpressionDescription.java
scala Main

this looks working fine as below:

I am usage

    I am extended

HyukjinKwon · 2017-08-01T07:06:34Z

I will revert 389fc6e if the test passes.

SparkQA · 2017-08-01T08:22:48Z

Test build #80111 has finished for PR 18749 at commit 389fc6e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-01T09:45:19Z

Test build #80113 has finished for PR 18749 at commit f76af78.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-01T12:04:27Z

Test build #80115 has finished for PR 18749 at commit 980ede7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-01T12:47:03Z

Test build #80116 has finished for PR 18749 at commit 1fa02c8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-01T13:07:42Z

Now I am pretty confident for the compatibility concern given:

the commit, 389fc6e (and with a little additional change, 1fa02c8), where I intendedly left extendeds as are in ExpressionDescription and ExpressionInfo usage, instead of setting new examples and arguments by the change proposal here to check the source compatibility.
the manual tests, [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation #18749 (comment), above to check binary compatibility.

I am quite sure if the current state should be ready for another look if the last test passes, where I reverted the former, 389fc6e, above. I guess it should pass because the only diff is 1fa02c8 but will wait for the results before pinging again.

SparkQA · 2017-08-01T15:20:27Z

Test build #80118 has finished for PR 18749 at commit b037592.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-01T16:35:22Z

retest this please

SparkQA · 2017-08-01T19:41:52Z

Test build #80123 has finished for PR 18749 at commit b037592.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-01T22:46:29Z

@srowen, @rxin and @cloud-fan, I am confident for the compatibility concern per #18749 (comment) and believe it is ready for another look.

rxin · 2017-08-01T23:00:55Z

What is the compatibility concern?

HyukjinKwon · 2017-08-01T23:15:44Z

Here, both comments by Sean, #18749 (comment) and #18749 (comment).

It looks we are okay to change this given #18749 (comment) but it is said in the comment, "DeveloperApi". So, I tried to keep the compatibility at my best.

So, I tried to think both possible compatibility cases:

Checking If existing usage by developer still works with this change:

 @ExpressionDescription(
    usage = "...",
    extended = """
      ...
    """)

The current PR changes extended to examples and arguments in its usage. So, I manually reverted this usage and checked if it passes the test.

In other words, the tests pass with this change, and with both usages:

 @ExpressionDescription(
    usage = "...",
    arguments = """,
      ...
    """
    examples = """,
      ...
    """)

and

 @ExpressionDescription(
    usage = "...",
    extended = """
      ...
    """)

Checking binary compatibility by annotation. I added few element in this PR, examples, arguments, note and since. I manually checked this in [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation #18749 (comment).

Did I maybe misunderstand something?

rxin · 2017-08-01T23:24:34Z

OK great. I think we should avoid breaking developer APIs, unless it has a huge upside. It wouldn't be fun to break it just for some cosmetic things ...

This reverts commit 389fc6ef788bf971f846ca36f49cea6a1c98b0d0.

SparkQA · 2017-08-04T13:03:49Z

Test build #80248 has finished for PR 18749 at commit 974eab2.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-04T13:12:19Z

retest this please

SparkQA · 2017-08-04T15:46:10Z

Test build #80250 has finished for PR 18749 at commit 974eab2.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-04T16:03:05Z

retest this please

gatorsmile · 2017-08-04T16:32:32Z

We have to define a clear rule. We do not want to revert the PRs merged by others. It just makes others awkward.

In the past, based on my observation in the Spark SQL, when the committers left SGTM or looks good to me, it does not mean the quality is good enough for merging. That only means the quality or solution looks OK but needs more reviews/changes. I am not sure how the other Spark components work. However, I believe this is the rule we follow in Spark SQL. That is why I am surprised some committers merge the PR without an explicit LGTM message.

srowen · 2017-08-04T16:39:26Z

We have never had such a rule here or in any other project. Let's use common sense. If someone says 'looks good' that's fine. Or anything equivalent. If in doubt ask.

gatorsmile · 2017-08-04T16:44:07Z

In Spark SQL, we require such an explicit message. That is my understanding. If we want a change, I think we need to send a note to all the committers and let them know it. Then, they will leave such messages more carefully.

Spark SQL becomes more and more complex. It is very easy to break the backward compatibility or even return an incorrect result. We are building infrastructure software. We have to be more careful, because our current test case coverage is still not good enough.

HyukjinKwon · 2017-08-04T16:45:01Z

@gatorsmile, I really don't want to go through your PRs and I have not ever done such things i the past. Let's document this and then let me follow. Please open a discussion in a mailing list if you are in doubt. We are not supposed to make a long term decision.

srowen · 2017-08-04T16:56:52Z

@gatorsmile you are the one asking for a change, and I do not see support for it. I am not aware of any Spark SQL-specific rule; what are you referring to? Reynold agrees above, for example, that in some cases it's justified to merge a change with no other review. This alone suggests there is no rule about typing "LGTM".

You rightly say that things need review. The important change, the more it needs review. That is not an argument for a voting rule; it's an argument for good judgment among committers, and I see no evidence that's lacking. An "LGTM" rule doesn't solve that problem: a committer with bad judgment can always approve something for merging that shouldn't, right?

I further can't understand what's special about "LGTM". You are saying that committers say "sounds good to me" when they mean "this is not ready for merging" but whenever they say "LGTM" they always mean "this is ready for merging" -- sorry, what?

You are suggesting a change to a form of RTC, which would require a PMC vote. Consider the implications: nobody can merge a change until a vote completes. Are you sure?

In the meantime, come now, let's move ahead with this PR, which was obviously reviewed.

rxin · 2017-08-04T17:51:06Z

@srowen that's not what I said. Almost always an explicit LGTM from somebody familiar with the codebase is preferred. There are tiny changes that might not require them, and it is up to the judgement of the committer. But those are more exceptions than the norm.

srowen · 2017-08-04T17:54:57Z

Yep, I agree with 'almost always', and tiny changes might not require them. I think that's what I said you said? "some cases". See also #18749 (comment)

But this alone disagrees with the idea that all changes need the string "LGTM" typed.

I am not sure why this popped up, but I propose we continue with the same process and conventions we always have. If anyone's unsure about whether an important review is affirmative: please ask.

Can we put this to bed now?

SparkQA · 2017-08-04T18:35:27Z

Test build #80253 has finished for PR 18749 at commit 974eab2.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-04T18:37:34Z

retest this please

gatorsmile · 2017-08-04T20:58:46Z

I am also afraid somebody might misunderstand what I or a few other committers say SGTM or looks good in Spark SQL. When a committer tries to merge Spark SQL related PRs based on these messages, please double checks. Does this sound reasonable?

SparkQA · 2017-08-04T21:15:02Z

Test build #80261 has finished for PR 18749 at commit 974eab2.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-05T01:15:51Z

retest this please

SparkQA · 2017-08-05T03:35:23Z

Test build #80274 has finished for PR 18749 at commit 974eab2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-05T06:57:43Z

retest this please

SparkQA · 2017-08-05T10:16:31Z

Test build #80282 has finished for PR 18749 at commit 974eab2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-05T10:29:36Z

Other test failures look spurious and unrelated. I guess we are okay to go ahead with this PR anyway?

gatorsmile · 2017-08-05T17:10:31Z

LGTM.

Thanks! Merging to master.

…tation by DESCRIBE FUNCTION at SQLQueryTestSuite Currently, there are some tests testing function descriptions: ```bash $ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json; ``` Looks there are not quite good points about testing them since we're not going to test documentation itself. For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there. See the test failures in #18749 (where I added examples to function descriptions) We better remove those tests so that people don't add such tests in the SQL tests. ## How was this patch tested? Manual. Closes #22776 from HyukjinKwon/SPARK-25779. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…tation by DESCRIBE FUNCTION at SQLQueryTestSuite Currently, there are some tests testing function descriptions: ```bash $ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json; ``` Looks there are not quite good points about testing them since we're not going to test documentation itself. For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there. See the test failures in apache#18749 (where I added examples to function descriptions) We better remove those tests so that people don't add such tests in the SQL tests. ## How was this patch tested? Manual. Closes apache#22776 from HyukjinKwon/SPARK-25779. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

HyukjinKwon commented Jul 27, 2017

View reviewed changes

HyukjinKwon force-pushed the followup-sql-doc-gen branch from d9277bf to cb2dfe7 Compare July 27, 2017 13:34

srowen reviewed Jul 27, 2017

View reviewed changes

HyukjinKwon commented Jul 29, 2017

View reviewed changes

HyukjinKwon force-pushed the followup-sql-doc-gen branch from 980ede7 to 1fa02c8 Compare August 1, 2017 09:44

Revert "Source compatibility test"

974eab2

This reverts commit 389fc6ef788bf971f846ca36f49cea6a1c98b0d0.

HyukjinKwon force-pushed the followup-sql-doc-gen branch from b037592 to 974eab2 Compare August 4, 2017 10:39

asfgit closed this in ba327ee Aug 5, 2017

HyukjinKwon deleted the followup-sql-doc-gen branch January 2, 2018 03:37

HyukjinKwon mentioned this pull request Oct 19, 2018

[SPARK-25779][SQL][TESTS] Remove SQL query tests for function documentation by DESCRIBE FUNCTION at SQLQueryTestSuite #22776

Closed

Conversation

HyukjinKwon commented Jul 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 27, 2017

Uh oh!

SparkQA commented Jul 27, 2017

Uh oh!

SparkQA commented Jul 27, 2017

Uh oh!

SparkQA commented Jul 27, 2017

Uh oh!

HyukjinKwon commented Jul 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jul 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 29, 2017

Uh oh!

SparkQA commented Jul 29, 2017

Uh oh!

HyukjinKwon commented Aug 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

HyukjinKwon commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

HyukjinKwon commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

HyukjinKwon commented Aug 1, 2017

Uh oh!

rxin commented Aug 1, 2017

Uh oh!

HyukjinKwon commented Aug 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 4, 2017

Uh oh!

HyukjinKwon commented Aug 4, 2017

Uh oh!

SparkQA commented Aug 4, 2017

Uh oh!

HyukjinKwon commented Aug 4, 2017

Uh oh!

gatorsmile commented Aug 4, 2017

Uh oh!

srowen commented Aug 4, 2017

Uh oh!

HyukjinKwon commented Jul 27, 2017 •

edited

Loading

HyukjinKwon commented Aug 1, 2017 •

edited

Loading

HyukjinKwon commented Aug 1, 2017 •

edited

Loading

rxin commented Aug 4, 2017 •

edited

Loading