Skip to content

[SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation#18749

Closed
HyukjinKwon wants to merge 9 commits intoapache:masterfrom
HyukjinKwon:followup-sql-doc-gen
Closed

[SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation#18749
HyukjinKwon wants to merge 9 commits intoapache:masterfrom
HyukjinKwon:followup-sql-doc-gen

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jul 27, 2017

What changes were proposed in this pull request?

This PR proposes to separate extended into examples and arguments internally so that both can be separately documented and add since and note for additional information.

For since, it looks users sometimes get confused by, up to my knowledge, missing version information. For example, see https://www.mail-archive.com/user@spark.apache.org/msg64798.html

For few good examples to check the built documentation, please see both:
from_json - https://spark-test.github.io/sparksqldoc/#from_json
like - https://spark-test.github.io/sparksqldoc/#like

For DESCRIBE FUNCTION, note and since are added as below:

> DESCRIBE FUNCTION EXTENDED rlike;
...
Extended Usage:
    Arguments:
      ...

    Examples:
      ...

    Note:
      Use LIKE to match with simple string pattern
> DESCRIBE FUNCTION EXTENDED to_json;
...
    Examples:
      ...

    Since: 2.2.0

For the complete documentation, see https://spark-test.github.io/sparksqldoc/

How was this patch tested?

Manual tests and existing tests. Please see https://spark-test.github.io/sparksqldoc

Jenkins tests are needed to double check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added since for both from_json and to_json as examples here. These were added into SQL since 2.2.0 - (https://issues.apache.org/jira/browse/SPARK-19637 and https://issues.apache.org/jira/browse/SPARK-19967)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are okay to change this -

* considered an internal API to Spark SQL and are subject to change between minor releases.
and I could not find this in documentation.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately with note/since in SQL built-in function documentation [WIP][SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation Jul 27, 2017
@SparkQA
Copy link

SparkQA commented Jul 27, 2017

Test build #79999 has finished for PR 18749 at commit 0958c0b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon force-pushed the followup-sql-doc-gen branch from d9277bf to cb2dfe7 Compare July 27, 2017 13:34
@SparkQA
Copy link

SparkQA commented Jul 27, 2017

Test build #80002 has finished for PR 18749 at commit d9277bf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2017

Test build #80003 has finished for PR 18749 at commit cb2dfe7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2017

Test build #80004 has finished for PR 18749 at commit 54e6d82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation Jul 27, 2017
@HyukjinKwon
Copy link
Member Author

cc @rxin, @srowen and @cloud-fan, I believe this one is ready for a review. Could you take a look when you have some time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't many ExpressionInfo objects in memory right? adding more info to this bean doesn't have any meaningful performance implications, I presume. I suppose it's just breaking down the existing info further.

I also presume this is considered an internal API so it's OK to change the constructor. You could even retain the one constructor that is removed, just in case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so for the former and sure will keep the original constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that's my only question, whether this change matters, because it's a developer API. You provide default implementations, though extended() gets removed. Hm. I am wondering if it's possible to keep extended() but, well, ignore it? it would at least be compatible even if it meant someone's implementation out there would have to update to provide information to ExpressionInfo correctly. That's not really a functional problem though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will give a try.

@HyukjinKwon
Copy link
Member Author

Will address comments within few days. (I am reading docs just to get used to things around my updated status)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually tested source compatibility and I guess it also won't break binary compatibility - https://stackoverflow.com/questions/21197255/is-adding-a-method-to-a-java-annotation-safe-for-backward-compatibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last commit bf48875, I tried to keep the original behaviour and compatibility (although I guess we are fine to break this as said in few comments above):

For ExpressionDescription and ExpressionInfo,

if extended is an empty string (the default value), it creates the extended made by arguments, examples, note and since and use this as the extended description.

if extended is a non-empty string, it ignores the new arguments, examples, note and since but use this as the extended description, assuming this was set manually.

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80044 has finished for PR 18749 at commit f51ee74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80043 has finished for PR 18749 at commit bf48875.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Aug 1, 2017

For source compatibility, in the commit, 389fc6e, I tried to move back to extended to show it passes the tests and keep the source compatibility.

For binary compatibility, I tested ExpressionDescription (which I guess we are worried of) in the way I described before - databricks/scala-style-guide#46.

With ExpressionDescription.java assuming this is Spark 2.2.0:

import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
@Retention(RetentionPolicy.RUNTIME)
public @interface ExpressionDescription {
    String usage() default "";
    String extended() default "";
}

With Main.scala assuming this is the user application compiled with Spark 2.2.0:

@ExpressionDescription(
  usage = "I am usage",
  extended = """
    I am extended
  """)
case class A()

object Main {
  def main(args: Array[String]): Unit = {
    val df = scala.reflect.classTag[A].runtimeClass.getAnnotation(classOf[ExpressionDescription]).
    println(df.usage())
    println(df.extended())
  }
}
javac ExpressionDescription.java
scalac Main.scala
scala Main

prints

I am usage

    I am extended

And then, I changed ExpressionDescription.java assuming we merged this change into Spark's next release:

 import java.lang.annotation.Retention;
 import java.lang.annotation.RetentionPolicy;
 @Retention(RetentionPolicy.RUNTIME)
 public @interface ExpressionDescription {
     String usage() default "";
     String extended() default "";
+    String arguments() default "";
+    String examples() default "";
+    String note() default "";
+    String since() default "";
 }

and then I ran

javac ExpressionDescription.java
scala Main

this looks working fine as below:

I am usage

    I am extended

@HyukjinKwon
Copy link
Member Author

I will revert 389fc6e if the test passes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80111 has finished for PR 18749 at commit 389fc6e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon force-pushed the followup-sql-doc-gen branch from 980ede7 to 1fa02c8 Compare August 1, 2017 09:44
@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80113 has finished for PR 18749 at commit f76af78.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80115 has finished for PR 18749 at commit 980ede7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80116 has finished for PR 18749 at commit 1fa02c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Now I am pretty confident for the compatibility concern given:

I am quite sure if the current state should be ready for another look if the last test passes, where I reverted the former, 389fc6e, above. I guess it should pass because the only diff is 1fa02c8 but will wait for the results before pinging again.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80118 has finished for PR 18749 at commit b037592.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80123 has finished for PR 18749 at commit b037592.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

@srowen, @rxin and @cloud-fan, I am confident for the compatibility concern per #18749 (comment) and believe it is ready for another look.

@rxin
Copy link
Contributor

rxin commented Aug 1, 2017

What is the compatibility concern?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Aug 1, 2017

Here, both comments by Sean, #18749 (comment) and #18749 (comment).

It looks we are okay to change this given #18749 (comment) but it is said in the comment, "DeveloperApi". So, I tried to keep the compatibility at my best.

So, I tried to think both possible compatibility cases:

  • Checking If existing usage by developer still works with this change:

     @ExpressionDescription(
        usage = "...",
        extended = """
          ...
        """)

    The current PR changes extended to examples and arguments in its usage. So, I manually reverted this usage and checked if it passes the test.

    In other words, the tests pass with this change, and with both usages:

     @ExpressionDescription(
        usage = "...",
        arguments = """,
          ...
        """
        examples = """,
          ...
        """)

    and

     @ExpressionDescription(
        usage = "...",
        extended = """
          ...
        """)
  • Checking binary compatibility by annotation. I added few element in this PR, examples, arguments, note and since. I manually checked this in [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation #18749 (comment).

Did I maybe misunderstand something?

@rxin
Copy link
Contributor

rxin commented Aug 1, 2017

OK great. I think we should avoid breaking developer APIs, unless it has a huge upside. It wouldn't be fun to break it just for some cosmetic things ...

This reverts commit 389fc6ef788bf971f846ca36f49cea6a1c98b0d0.
@HyukjinKwon HyukjinKwon force-pushed the followup-sql-doc-gen branch from b037592 to 974eab2 Compare August 4, 2017 10:39
@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80248 has finished for PR 18749 at commit 974eab2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80250 has finished for PR 18749 at commit 974eab2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@gatorsmile
Copy link
Member

We have to define a clear rule. We do not want to revert the PRs merged by others. It just makes others awkward.

In the past, based on my observation in the Spark SQL, when the committers left SGTM or looks good to me, it does not mean the quality is good enough for merging. That only means the quality or solution looks OK but needs more reviews/changes. I am not sure how the other Spark components work. However, I believe this is the rule we follow in Spark SQL. That is why I am surprised some committers merge the PR without an explicit LGTM message.

@srowen
Copy link
Member

srowen commented Aug 4, 2017

We have never had such a rule here or in any other project. Let's use common sense. If someone says 'looks good' that's fine. Or anything equivalent. If in doubt ask.

@gatorsmile
Copy link
Member

In Spark SQL, we require such an explicit message. That is my understanding. If we want a change, I think we need to send a note to all the committers and let them know it. Then, they will leave such messages more carefully.

Spark SQL becomes more and more complex. It is very easy to break the backward compatibility or even return an incorrect result. We are building infrastructure software. We have to be more careful, because our current test case coverage is still not good enough.

@HyukjinKwon
Copy link
Member Author

@gatorsmile, I really don't want to go through your PRs and I have not ever done such things i the past. Let's document this and then let me follow. Please open a discussion in a mailing list if you are in doubt. We are not supposed to make a long term decision.

@srowen
Copy link
Member

srowen commented Aug 4, 2017

@gatorsmile you are the one asking for a change, and I do not see support for it. I am not aware of any Spark SQL-specific rule; what are you referring to? Reynold agrees above, for example, that in some cases it's justified to merge a change with no other review. This alone suggests there is no rule about typing "LGTM".

You rightly say that things need review. The important change, the more it needs review. That is not an argument for a voting rule; it's an argument for good judgment among committers, and I see no evidence that's lacking. An "LGTM" rule doesn't solve that problem: a committer with bad judgment can always approve something for merging that shouldn't, right?

I further can't understand what's special about "LGTM". You are saying that committers say "sounds good to me" when they mean "this is not ready for merging" but whenever they say "LGTM" they always mean "this is ready for merging" -- sorry, what?

You are suggesting a change to a form of RTC, which would require a PMC vote. Consider the implications: nobody can merge a change until a vote completes. Are you sure?

In the meantime, come now, let's move ahead with this PR, which was obviously reviewed.

@rxin
Copy link
Contributor

rxin commented Aug 4, 2017

@srowen that's not what I said. Almost always an explicit LGTM from somebody familiar with the codebase is preferred. There are tiny changes that might not require them, and it is up to the judgement of the committer. But those are more exceptions than the norm.

@srowen
Copy link
Member

srowen commented Aug 4, 2017

Yep, I agree with 'almost always', and tiny changes might not require them. I think that's what I said you said? "some cases". See also #18749 (comment)

But this alone disagrees with the idea that all changes need the string "LGTM" typed.

I am not sure why this popped up, but I propose we continue with the same process and conventions we always have. If anyone's unsure about whether an important review is affirmative: please ask.

Can we put this to bed now?

@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80253 has finished for PR 18749 at commit 974eab2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@gatorsmile
Copy link
Member

I am also afraid somebody might misunderstand what I or a few other committers say SGTM or looks good in Spark SQL. When a committer tries to merge Spark SQL related PRs based on these messages, please double checks. Does this sound reasonable?

@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80261 has finished for PR 18749 at commit 974eab2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 5, 2017

Test build #80274 has finished for PR 18749 at commit 974eab2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 5, 2017

Test build #80282 has finished for PR 18749 at commit 974eab2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Other test failures look spurious and unrelated. I guess we are okay to go ahead with this PR anyway?

@gatorsmile
Copy link
Member

LGTM.

Thanks! Merging to master.

@asfgit asfgit closed this in ba327ee Aug 5, 2017
@HyukjinKwon HyukjinKwon deleted the followup-sql-doc-gen branch January 2, 2018 03:37
asfgit pushed a commit that referenced this pull request Oct 21, 2018
…tation by DESCRIBE FUNCTION at SQLQueryTestSuite

Currently, there are some tests testing function descriptions:

```bash
$ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json;
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json;
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json;
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json;
```

Looks there are not quite good points about testing them since we're not going to test documentation itself.
For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there.
See the test failures in #18749 (where I added examples to function descriptions)

We better remove those tests so that people don't add such tests in the SQL tests.

## How was this patch tested?

Manual.

Closes #22776 from HyukjinKwon/SPARK-25779.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…tation by DESCRIBE FUNCTION at SQLQueryTestSuite

Currently, there are some tests testing function descriptions:

```bash
$ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json;
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json;
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json;
sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json;
```

Looks there are not quite good points about testing them since we're not going to test documentation itself.
For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there.
See the test failures in apache#18749 (where I added examples to function descriptions)

We better remove those tests so that people don't add such tests in the SQL tests.

## How was this patch tested?

Manual.

Closes apache#22776 from HyukjinKwon/SPARK-25779.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments