[SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments separately, and note/since in SQL built-in function documentation#18749
Conversation
There was a problem hiding this comment.
I added since for both from_json and to_json as examples here. These were added into SQL since 2.2.0 - (https://issues.apache.org/jira/browse/SPARK-19637 and https://issues.apache.org/jira/browse/SPARK-19967)
There was a problem hiding this comment.
I believe we are okay to change this -
and I could not find this in documentation.|
Test build #79999 has finished for PR 18749 at commit
|
d9277bf to
cb2dfe7
Compare
|
Test build #80002 has finished for PR 18749 at commit
|
|
Test build #80003 has finished for PR 18749 at commit
|
|
Test build #80004 has finished for PR 18749 at commit
|
|
cc @rxin, @srowen and @cloud-fan, I believe this one is ready for a review. Could you take a look when you have some time? |
There was a problem hiding this comment.
There aren't many ExpressionInfo objects in memory right? adding more info to this bean doesn't have any meaningful performance implications, I presume. I suppose it's just breaking down the existing info further.
I also presume this is considered an internal API so it's OK to change the constructor. You could even retain the one constructor that is removed, just in case.
There was a problem hiding this comment.
Yes, I think so for the former and sure will keep the original constructor.
There was a problem hiding this comment.
Agree, that's my only question, whether this change matters, because it's a developer API. You provide default implementations, though extended() gets removed. Hm. I am wondering if it's possible to keep extended() but, well, ignore it? it would at least be compatible even if it meant someone's implementation out there would have to update to provide information to ExpressionInfo correctly. That's not really a functional problem though.
|
Will address comments within few days. (I am reading docs just to get used to things around my updated status) |
There was a problem hiding this comment.
I manually tested source compatibility and I guess it also won't break binary compatibility - https://stackoverflow.com/questions/21197255/is-adding-a-method-to-a-java-annotation-safe-for-backward-compatibility
There was a problem hiding this comment.
In the last commit bf48875, I tried to keep the original behaviour and compatibility (although I guess we are fine to break this as said in few comments above):
For ExpressionDescription and ExpressionInfo,
if extended is an empty string (the default value), it creates the extended made by arguments, examples, note and since and use this as the extended description.
if extended is a non-empty string, it ignores the new arguments, examples, note and since but use this as the extended description, assuming this was set manually.
|
Test build #80044 has finished for PR 18749 at commit
|
|
Test build #80043 has finished for PR 18749 at commit
|
|
For source compatibility, in the commit, 389fc6e, I tried to move back to For binary compatibility, I tested With import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
@Retention(RetentionPolicy.RUNTIME)
public @interface ExpressionDescription {
String usage() default "";
String extended() default "";
}With @ExpressionDescription(
usage = "I am usage",
extended = """
I am extended
""")
case class A()
object Main {
def main(args: Array[String]): Unit = {
val df = scala.reflect.classTag[A].runtimeClass.getAnnotation(classOf[ExpressionDescription]).
println(df.usage())
println(df.extended())
}
}prints And then, I changed import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
@Retention(RetentionPolicy.RUNTIME)
public @interface ExpressionDescription {
String usage() default "";
String extended() default "";
+ String arguments() default "";
+ String examples() default "";
+ String note() default "";
+ String since() default "";
}and then I ran this looks working fine as below: |
|
I will revert 389fc6e if the test passes. |
|
Test build #80111 has finished for PR 18749 at commit
|
980ede7 to
1fa02c8
Compare
|
Test build #80113 has finished for PR 18749 at commit
|
|
Test build #80115 has finished for PR 18749 at commit
|
|
Test build #80116 has finished for PR 18749 at commit
|
|
Now I am pretty confident for the compatibility concern given:
I am quite sure if the current state should be ready for another look if the last test passes, where I reverted the former, 389fc6e, above. I guess it should pass because the only diff is 1fa02c8 but will wait for the results before pinging again. |
|
Test build #80118 has finished for PR 18749 at commit
|
|
retest this please |
|
Test build #80123 has finished for PR 18749 at commit
|
|
@srowen, @rxin and @cloud-fan, I am confident for the compatibility concern per #18749 (comment) and believe it is ready for another look. |
|
What is the compatibility concern? |
|
Here, both comments by Sean, #18749 (comment) and #18749 (comment). It looks we are okay to change this given #18749 (comment) but it is said in the comment, "DeveloperApi". So, I tried to keep the compatibility at my best. So, I tried to think both possible compatibility cases:
Did I maybe misunderstand something? |
|
OK great. I think we should avoid breaking developer APIs, unless it has a huge upside. It wouldn't be fun to break it just for some cosmetic things ... |
This reverts commit 389fc6ef788bf971f846ca36f49cea6a1c98b0d0.
b037592 to
974eab2
Compare
|
Test build #80248 has finished for PR 18749 at commit
|
|
retest this please |
|
Test build #80250 has finished for PR 18749 at commit
|
|
retest this please |
|
We have to define a clear rule. We do not want to revert the PRs merged by others. It just makes others awkward. In the past, based on my observation in the Spark SQL, when the committers left |
|
We have never had such a rule here or in any other project. Let's use common sense. If someone says 'looks good' that's fine. Or anything equivalent. If in doubt ask. |
|
In Spark SQL, we require such an explicit message. That is my understanding. If we want a change, I think we need to send a note to all the committers and let them know it. Then, they will leave such messages more carefully. Spark SQL becomes more and more complex. It is very easy to break the backward compatibility or even return an incorrect result. We are building infrastructure software. We have to be more careful, because our current test case coverage is still not good enough. |
|
@gatorsmile, I really don't want to go through your PRs and I have not ever done such things i the past. Let's document this and then let me follow. Please open a discussion in a mailing list if you are in doubt. We are not supposed to make a long term decision. |
|
@gatorsmile you are the one asking for a change, and I do not see support for it. I am not aware of any Spark SQL-specific rule; what are you referring to? Reynold agrees above, for example, that in some cases it's justified to merge a change with no other review. This alone suggests there is no rule about typing "LGTM". You rightly say that things need review. The important change, the more it needs review. That is not an argument for a voting rule; it's an argument for good judgment among committers, and I see no evidence that's lacking. An "LGTM" rule doesn't solve that problem: a committer with bad judgment can always approve something for merging that shouldn't, right? I further can't understand what's special about "LGTM". You are saying that committers say "sounds good to me" when they mean "this is not ready for merging" but whenever they say "LGTM" they always mean "this is ready for merging" -- sorry, what? You are suggesting a change to a form of RTC, which would require a PMC vote. Consider the implications: nobody can merge a change until a vote completes. Are you sure? In the meantime, come now, let's move ahead with this PR, which was obviously reviewed. |
|
@srowen that's not what I said. Almost always an explicit LGTM from somebody familiar with the codebase is preferred. There are tiny changes that might not require them, and it is up to the judgement of the committer. But those are more exceptions than the norm. |
|
Yep, I agree with 'almost always', and tiny changes might not require them. I think that's what I said you said? "some cases". See also #18749 (comment) But this alone disagrees with the idea that all changes need the string "LGTM" typed. I am not sure why this popped up, but I propose we continue with the same process and conventions we always have. If anyone's unsure about whether an important review is affirmative: please ask. Can we put this to bed now? |
|
Test build #80253 has finished for PR 18749 at commit
|
|
retest this please |
|
I am also afraid somebody might misunderstand what I or a few other committers say |
|
Test build #80261 has finished for PR 18749 at commit
|
|
retest this please |
|
Test build #80274 has finished for PR 18749 at commit
|
|
retest this please |
|
Test build #80282 has finished for PR 18749 at commit
|
|
Other test failures look spurious and unrelated. I guess we are okay to go ahead with this PR anyway? |
|
LGTM. Thanks! Merging to master. |
…tation by DESCRIBE FUNCTION at SQLQueryTestSuite Currently, there are some tests testing function descriptions: ```bash $ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json; ``` Looks there are not quite good points about testing them since we're not going to test documentation itself. For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there. See the test failures in #18749 (where I added examples to function descriptions) We better remove those tests so that people don't add such tests in the SQL tests. ## How was this patch tested? Manual. Closes #22776 from HyukjinKwon/SPARK-25779. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…tation by DESCRIBE FUNCTION at SQLQueryTestSuite Currently, there are some tests testing function descriptions: ```bash $ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json; ``` Looks there are not quite good points about testing them since we're not going to test documentation itself. For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there. See the test failures in apache#18749 (where I added examples to function descriptions) We better remove those tests so that people don't add such tests in the SQL tests. ## How was this patch tested? Manual. Closes apache#22776 from HyukjinKwon/SPARK-25779. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR proposes to separate
extendedintoexamplesandargumentsinternally so that both can be separately documented and addsinceandnotefor additional information.For
since, it looks users sometimes get confused by, up to my knowledge, missing version information. For example, see https://www.mail-archive.com/user@spark.apache.org/msg64798.htmlFor few good examples to check the built documentation, please see both:
from_json- https://spark-test.github.io/sparksqldoc/#from_jsonlike- https://spark-test.github.io/sparksqldoc/#likeFor
DESCRIBE FUNCTION,noteandsinceare added as below:For the complete documentation, see https://spark-test.github.io/sparksqldoc/
How was this patch tested?
Manual tests and existing tests. Please see https://spark-test.github.io/sparksqldoc
Jenkins tests are needed to double check