-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations #25558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * @since 1.3.0 | ||
| */ | ||
| @Experimental | ||
| @Unstable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to have it Unstable? It has not changed for over 2 years...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I wasn't sure about removing Unstable, but seems like some that clearly can go. Maybe anything not changed in a long time as you say.
| * Concrete implementation of a [[BaseSessionStateBuilder]]. | ||
| */ | ||
| @Experimental | ||
| @Unstable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, not sure it make sense for this to be unstable..
| * about the resource. Please refer to [[org.apache.spark.resource.ResourceInformation]] for | ||
| * specifics. | ||
| */ | ||
| @Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was just added in 3.0, it should be left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops right, I see it's part of GPU resources
| * batch so that Spark can access the data row by row. Instance of it is meant to be reused during | ||
| * the entire data loading process. | ||
| */ | ||
| @Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the new changes to Columnar introduced in 3.0 I wonder if we shouldn't leave this evolving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert the changes to Columnar*
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * :: :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line?
|
those changes LGTM, I only skimmed mostly and looked for ones I'm familiar with. |
|
Test build #109576 has finished for PR 25558 at commit
|
|
Test build #109587 has finished for PR 25558 at commit
|
| * @since 2.4.0 | ||
| */ | ||
| @Experimental | ||
| @deprecated("Please use 'org.apache.spark.sql.avro.functions.from_avro' instead.", "3.0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is added at 2.4.0, this removal looks okay because this API is already deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with removing the annotation here.
|
|
||
| import org.apache.spark.{SparkConf, SparkContext} | ||
| import org.apache.spark.annotation._ | ||
| import org.apache.spark.annotation.{DeveloperApi, Experimental, Stable, Unstable} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is the only remaining instance def experimental which is natural to be Experimental?
/**
* :: Experimental ::
* A collection of methods that are considered experimental, but can be used to hook into
* the query planner for advanced functionality.
*
* @group basic
* @since 1.3.0
*/
@Experimental
@transient
@Unstable
def experimental: ExperimentalMethods = sparkSession.experimentalThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much. This one looks inherently experimental, so, I left it.
| * Concrete implementation of a [[BaseSessionStateBuilder]]. | ||
| */ | ||
| @Experimental | ||
| @Unstable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SessionStateBuilder, we may want to keep @Unstable like HiveSessionStateBuilder.scala is @Unstable in this PR.
cc @gatorsmile , @cloud-fan , @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if in doubt let's just leave it in, but, yeah I also don't know if there's doubt or what as this hasn't changed in a long while.
|
Thank you so much, @srowen . This looks good to me. |
|
Test build #109589 has finished for PR 25558 at commit
|
|
Uh oh, this might have uncovered some latent issue with the Python Kinesis tests: I'll try to investigate separately. |
| * @since 1.3.0 | ||
| */ | ||
| @Experimental | ||
| @Unstable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should remain unstable, as it uses private classes Attribute Expression.
|
I've not checked all of the changes, just one note: some interfaces are marked as unstable because they contain private classes. We should keep them unstable even if the interface itself hasn't been changed for years. |
|
I'll put back the |
|
Test build #109744 has finished for PR 25558 at commit
|
|
OK, this is always going to fail right now as it triggers Kinesis tests, and there is some existing issue with Kinesis tests in Pyspark, which I am trying to figure out at #25559 (see also #24651). This should be an entirely non-functional change, so I'd like to merge it anyway as all of the build parts pass, and even the JVM Kinesis tests pass. |
|
I think most of Structured Streaming APIs in the master branch are labeled correctly. Could you revert changes of these APIs? We are waiting for DS v2 to finalize the Structured Streaming APIs, and don't want to promise a stable DS v1 APIs. |
|
@zsxwing would you say revert the changes to all of |
Yep.
I don't have a strong option here. We unlikely change them in future. |
|
@zsxwing have another look. I reverted a lot of the changes to streaming. Let me know if I missed an API that should remain experimental. |
|
Test build #109820 has finished for PR 25558 at commit
|
… that breaks Pyspark Kinesis tests
The Pyspark Kinesis tests are failing, at least in master:
```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
answer, self._gateway_client, None, self._fqn)
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
at scala.collection.Iterator.find(Iterator.scala:993)
at scala.collection.Iterator.find$(Iterator.scala:990)
at scala.collection.AbstractIterator.find(Iterator.scala:1429)
at scala.collection.IterableLike.find(IterableLike.scala:81)
at scala.collection.IterableLike.find$(IterableLike.scala:80)
at scala.collection.AbstractIterable.find(Iterable.scala:56)
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```
The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.
Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.
See #25558 (comment)
Closes #25559 from srowen/KinesisTest.
Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
… that breaks Pyspark Kinesis tests
The Pyspark Kinesis tests are failing, at least in master:
```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
answer, self._gateway_client, None, self._fqn)
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
at scala.collection.Iterator.find(Iterator.scala:993)
at scala.collection.Iterator.find$(Iterator.scala:990)
at scala.collection.AbstractIterator.find(Iterator.scala:1429)
at scala.collection.IterableLike.find(IterableLike.scala:81)
at scala.collection.IterableLike.find$(IterableLike.scala:80)
at scala.collection.AbstractIterable.find(Iterable.scala:56)
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```
The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.
Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.
See #25558 (comment)
Closes #25559 from srowen/KinesisTest.
Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit d5b7eed)
Signed-off-by: Sean Owen <[email protected]>
|
Test build #4850 has finished for PR 25558 at commit
|
|
Test build #4853 has finished for PR 25558 at commit
|
|
Test build #4854 has finished for PR 25558 at commit
|
|
Merged to master |
|
Thank you so much, @srowen ! |
… that breaks Pyspark Kinesis tests
The Pyspark Kinesis tests are failing, at least in master:
```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
answer, self._gateway_client, None, self._fqn)
File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
at scala.collection.Iterator.find(Iterator.scala:993)
at scala.collection.Iterator.find$(Iterator.scala:990)
at scala.collection.AbstractIterator.find(Iterator.scala:1429)
at scala.collection.IterableLike.find(IterableLike.scala:81)
at scala.collection.IterableLike.find$(IterableLike.scala:80)
at scala.collection.AbstractIterable.find(Iterable.scala:56)
at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```
The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.
Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.
See apache#25558 (comment)
Closes apache#25559 from srowen/KinesisTest.
Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit d5b7eed)
Signed-off-by: Sean Owen <[email protected]>
|
I went over the PR and below is the list to summarize the changes. Could you confirm whether it is complete?
|
|
It looks about right at a glance. I spot-checked many of them and they look correct, nothing was missing. If you did the hard work to list all of them out I take your word for it. (There weren't other pull requests besides this one) |
What changes were proposed in this pull request?
The Experimental and Evolving annotations are both (like Unstable) used to express that a an API may change. However there are many things in the code that have been marked that way since even Spark 1.x. Per the dev@ thread, anything introduced at or before Spark 2.3.0 is pretty much 'stable' in that it would not change without a deprecation cycle. Therefore I'd like to remove most of these annotations. And, remove the
:: Experimental ::scaladoc tag too. And likewise for Python, R.The changes below can be summarized as:
It's a big change to review, so I'd suggest scanning the list of files changed to see if any area seems like it should remain partly experimental and examine those.
Why are the changes needed?
Many of these annotations are incorrect; the APIs are de facto stable. Leaving them also makes legitimate usages of the annotations less meaningful.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.