[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations #25558

srowen · 2019-08-22T14:11:20Z

What changes were proposed in this pull request?

The Experimental and Evolving annotations are both (like Unstable) used to express that a an API may change. However there are many things in the code that have been marked that way since even Spark 1.x. Per the dev@ thread, anything introduced at or before Spark 2.3.0 is pretty much 'stable' in that it would not change without a deprecation cycle. Therefore I'd like to remove most of these annotations. And, remove the :: Experimental :: scaladoc tag too. And likewise for Python, R.

The changes below can be summarized as:

Generally, anything introduced at or before Spark 2.3.0 has been unmarked as neither Evolving nor Experimental
Obviously experimental items like DSv2, Barrier mode, ExperimentalMethods are untouched
I did unmark a few MLlib classes introduced in 2.4, as I am quite confident they're not going to change (e.g. KolmogorovSmirnovTest, PowerIterationClustering)

It's a big change to review, so I'd suggest scanning the list of files changed to see if any area seems like it should remain partly experimental and examine those.

Why are the changes needed?

Many of these annotations are incorrect; the APIs are de facto stable. Leaving them also makes legitimate usages of the annotations less meaningful.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

mgaido91 · 2019-08-22T14:25:29Z

sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala

 * @since 1.3.0
 */
-@Experimental
 @Unstable


does it make sense to have it Unstable? It has not changed for over 2 years...

Agree, I wasn't sure about removing Unstable, but seems like some that clearly can go. Maybe anything not changed in a long time as you say.

mgaido91 · 2019-08-22T14:26:58Z

sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala

 * Concrete implementation of a [[BaseSessionStateBuilder]].
 */
-@Experimental
 @Unstable


again, not sure it make sense for this to be unstable..

tgravescs · 2019-08-22T15:14:43Z

core/src/main/scala/org/apache/spark/TaskContext.scala

   * about the resource. Please refer to [[org.apache.spark.resource.ResourceInformation]] for
   * specifics.
   */
-  @Evolving


this was just added in 3.0, it should be left.

Oops right, I see it's part of GPU resources

tgravescs · 2019-08-22T15:18:52Z

sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java

 * batch so that Spark can access the data row by row. Instance of it is meant to be reused during
 * the entire data loading process.
 */
-@Evolving


with the new changes to Columnar introduced in 3.0 I wonder if we shouldn't leave this evolving.

I'll revert the changes to Columnar*

tgravescs · 2019-08-22T15:20:14Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala


  /**
-   * :: Experimental ::
+   * ::  ::


remove empty line?

tgravescs · 2019-08-22T15:56:02Z

those changes LGTM, I only skimmed mostly and looked for ones I'm familiar with.

SparkQA · 2019-08-22T16:12:18Z

Test build #109576 has finished for PR 25558 at commit 50bfcd6.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-22T16:15:31Z

Test build #109587 has finished for PR 25558 at commit c4732c9.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-08-22T18:27:29Z

external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala

   * @since 2.4.0
   */
-  @Experimental
  @deprecated("Please use 'org.apache.spark.sql.avro.functions.from_avro' instead.", "3.0.0")


Although this is added at 2.4.0, this removal looks okay because this API is already deprecated.

cc @gengliangwang

+1 with removing the annotation here.

dongjoon-hyun · 2019-08-22T18:37:23Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala


 import org.apache.spark.{SparkConf, SparkContext}
-import org.apache.spark.annotation._
+import org.apache.spark.annotation.{DeveloperApi, Experimental, Stable, Unstable}


So, is the only remaining instance def experimental which is natural to be Experimental?

/** * :: Experimental :: * A collection of methods that are considered experimental, but can be used to hook into * the query planner for advanced functionality. * * @group basic * @since 1.3.0 */ @Experimental @transient @Unstable def experimental: ExperimentalMethods = sparkSession.experimental

Pretty much. This one looks inherently experimental, so, I left it.

dongjoon-hyun · 2019-08-22T18:43:55Z

sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala

 * Concrete implementation of a [[BaseSessionStateBuilder]].
 */
-@Experimental
-@Unstable


For SessionStateBuilder, we may want to keep @Unstable like HiveSessionStateBuilder.scala is @Unstable in this PR.

cc @gatorsmile , @cloud-fan , @hvanhovell

Yeah if in doubt let's just leave it in, but, yeah I also don't know if there's doubt or what as this hasn't changed in a long while.

dongjoon-hyun · 2019-08-22T18:47:47Z

Thank you so much, @srowen . This looks good to me.
The only concern is SessionStateBuilder class. For SessionStateBuilder, let's wait for more comments from @gatorsmile , @cloud-fan , @hvanhovell .

SparkQA · 2019-08-22T19:10:37Z

Test build #109589 has finished for PR 25558 at commit 93305d2.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-22T19:26:05Z

Uh oh, this might have uncovered some latent issue with the Python Kinesis tests:

======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...

I'll try to investigate separately.

cloud-fan · 2019-08-23T01:46:41Z

sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala

 * @since 1.3.0
 */
-@Experimental
-@Unstable


This should remain unstable, as it uses private classes Attribute Expression.

cloud-fan · 2019-08-23T01:49:51Z

I've not checked all of the changes, just one note: some interfaces are marked as unstable because they contain private classes. We should keep them unstable even if the interface itself hasn't been changed for years.

srowen · 2019-08-23T07:38:43Z

I'll put back the Unstable annotations. I'm first figuring out if the kinesis issues can be resolved.

SparkQA · 2019-08-26T18:18:37Z

Test build #109744 has finished for PR 25558 at commit 8615ce3.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-26T19:17:30Z

OK, this is always going to fail right now as it triggers Kinesis tests, and there is some existing issue with Kinesis tests in Pyspark, which I am trying to figure out at #25559 (see also #24651). This should be an entirely non-functional change, so I'd like to merge it anyway as all of the build parts pass, and even the JVM Kinesis tests pass.

zsxwing · 2019-08-26T22:42:02Z

I think most of Structured Streaming APIs in the master branch are labeled correctly. Could you revert changes of these APIs? We are waiting for DS v2 to finalize the Structured Streaming APIs, and don't want to promise a stable DS v1 APIs.

srowen · 2019-08-26T22:57:19Z

@zsxwing would you say revert the changes to all of org.apache.sql.streaming.*? how about the Kinesis and DStream APIs? yes we should revert these if there is any doubt, just want to make sure I don't revert things you don't have in mind.

zsxwing · 2019-08-26T23:01:33Z

would you say revert the changes to all of org.apache.sql.streaming.*?

Yep.

how about the Kinesis and DStream APIs?

I don't have a strong option here. We unlikely change them in future.

srowen · 2019-08-27T14:57:58Z

@zsxwing have another look. I reverted a lot of the changes to streaming. Let me know if I missed an API that should remain experimental.

SparkQA · 2019-08-27T15:03:56Z

Test build #109820 has finished for PR 25558 at commit 6e9bb61.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

… that breaks Pyspark Kinesis tests The Pyspark Kinesis tests are failing, at least in master: ``` ====================================================================== ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils. : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) at scala.collection.Iterator.find(Iterator.scala:993) at scala.collection.Iterator.find$(Iterator.scala:990) at scala.collection.AbstractIterator.find(Iterator.scala:1429) at scala.collection.IterableLike.find(IterableLike.scala:81) at scala.collection.IterableLike.find$(IterableLike.scala:80) at scala.collection.AbstractIterable.find(Iterable.scala:56) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46) ... ``` The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK. Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options. See #25558 (comment) Closes #25559 from srowen/KinesisTest. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>

… that breaks Pyspark Kinesis tests The Pyspark Kinesis tests are failing, at least in master: ``` ====================================================================== ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils. : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) at scala.collection.Iterator.find(Iterator.scala:993) at scala.collection.Iterator.find$(Iterator.scala:990) at scala.collection.AbstractIterator.find(Iterator.scala:1429) at scala.collection.IterableLike.find(IterableLike.scala:81) at scala.collection.IterableLike.find$(IterableLike.scala:80) at scala.collection.AbstractIterable.find(Iterable.scala:56) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46) ... ``` The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK. Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options. See #25558 (comment) Closes #25559 from srowen/KinesisTest. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit d5b7eed) Signed-off-by: Sean Owen <[email protected]>

SparkQA · 2019-08-31T17:40:51Z

Test build #4850 has finished for PR 25558 at commit 6e9bb61.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-08-31T20:20:40Z

Test build #4853 has finished for PR 25558 at commit 6e9bb61.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-01T00:18:12Z

Test build #4854 has finished for PR 25558 at commit 6e9bb61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-09-01T15:15:21Z

Merged to master

dongjoon-hyun · 2019-09-01T22:35:05Z

Thank you so much, @srowen !

… that breaks Pyspark Kinesis tests The Pyspark Kinesis tests are failing, at least in master: ``` ====================================================================== ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils. : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) at scala.collection.Iterator.find(Iterator.scala:993) at scala.collection.Iterator.find$(Iterator.scala:990) at scala.collection.AbstractIterator.find(Iterator.scala:1429) at scala.collection.IterableLike.find(IterableLike.scala:81) at scala.collection.IterableLike.find$(IterableLike.scala:80) at scala.collection.AbstractIterable.find(Iterable.scala:56) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46) ... ``` The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK. Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options. See apache#25558 (comment) Closes apache#25559 from srowen/KinesisTest. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit d5b7eed) Signed-off-by: Sean Owen <[email protected]>

gatorsmile · 2019-09-23T17:14:18Z

I went over the PR and below is the list to summarize the changes. Could you confirm whether it is complete?

Removal of Experimental:
- org.apache.spark.metrics.source.CodegenMetrics
- org.apache.spark.metrics.source.HiveCatalogMetrics
- org.apache.spark.partial
- org.apache.spark.rdd.combineByKeyWithClassTag
- org.apache.spark.rdd.combineByKeyWithClassTag
- org.apache.spark.rdd.combineByKeyWithClassTag
- org.apache.spark.sql.from_avro [deprecated]
- org.apache.spark.sql.to_avro [deprecated]
- org.apache.spark.ml.classification.LinearSVC
- org.apache.spark.ml.classification.LinearSVCModel
- org.apache.spark.ml.classification.LogisticRegressionSummary
- org.apache.spark.ml.classification.LogisticRegressionTrainingSummary
- org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
- org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary
- org.apache.spark.ml.clustering.BisectingKMeansSummary
- org.apache.spark.ml.clustering.ClusteringSummary
- org.apache.spark.ml.clustering.GaussianMixtureSummary
- org.apache.hadoop.fs.Path.KMeansSummary
- org.apache.spark.ml.clustering.PowerIterationClustering
- org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
- org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
- org.apache.spark.ml.evaluation.RegressionEvaluator
- org.apache.spark.ml.feature.BucketedRandomProjectionLSHParams
- org.apache.spark.ml.feature.BucketedRandomProjectionLSH
- org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
- org.apache.spark.ml.feature.FeatureHasher
- org.apache.spark.ml.feature.Imputer
- org.apache.spark.ml.feature.ImputerModel
- org.apache.spark.ml.feature.MinHashLSHModel
- org.apache.spark.ml.feature.MinHashLSH
- org.apache.spark.ml.feature.RFormula
- org.apache.spark.ml.feature.RFormulaModel
- org.apache.spark.ml.feature.VectorSizeHint
- org.apache.spark.ml.fpm.FPGrowth
- org.apache.spark.ml.fpm.FPGrowthModel
- org.apache.spark.ml.fpm.PrefixSpan
- org.apache.spark.ml.fpm.PrefixSpan.findFrequentSequentialPatterns
- org.apache.spark.ml.image.ImageSchema
- org.apache.spark.ml.regression.AFTSurvivalRegression
- org.apache.spark.ml.regression.AFTSurvivalRegressionModel
- org.apache.spark.ml.regression.GeneralizedLinearRegression
- org.apache.spark.ml.regression.GeneralizedLinearRegressionModel
- org.apache.spark.ml.regression.GeneralizedLinearRegressionSummary
- org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary
- org.apache.spark.ml.regression.LinearRegressionTrainingSummary
- org.apache.spark.ml.regression.LinearRegressionSummary
- org.apache.spark.ml.stat.ChiSquareTest
- org.apache.spark.ml.stat.Correlation
- org.apache.spark.ml.stat.Correlation.corr
- org.apache.spark.ml.stat.KolmogorovSmirnovTest
- org.apache.spark.ml.stat.SummaryBuilder
- org.apache.spark.ml.stat.Summarizer
- org.apache.spark.ml.util.PMMLUtils.loadFromString
- pyspark.context.binaryFiles
- pyspark.context.binaryRecords
- pyspark.ml.fpm.FPGrowth
- pyspark.ml.fpm.FPGrowthModel
- pyspark.ml.fpm.PrefixSpan
- pyspark.ml.feature.BucketedRandomProjectionLSH
- pyspark.ml.feature.BucketedRandomProjectionLSHModel
- pyspark.ml.feature.ChiSqSelector
- pyspark.ml.feature.ChiSqSelectorModel
- pyspark.ml.feature.FeatureHasher
- pyspark.ml.feature.Imputer
- pyspark.ml.feature.ImputerModel
- pyspark.ml.feature.MinHashLSH
- pyspark.ml.feature.MinHashLSHModel
- pyspark.ml.feature.QuantileDiscretizer
- pyspark.ml.feature.RFormula
- pyspark.ml.feature.RFormulaModel
- pyspark.ml.feature.VectorSizeHint
- pyspark.ml.classification.LinearSVC
- pyspark.ml.classification.LinearSVCModel
- pyspark.ml.classification.LogisticRegressionSummary
- pyspark.ml.classification.LogisticRegressionTrainingSummary
- pyspark.ml.classification.BinaryLogisticRegressionSummary
- pyspark.ml.classification.OneVsRest
- pyspark.ml.classification.OneVsRestModel
- pyspark.ml.clustering.GaussianMixtureSummary
- pyspark.ml.clustering.KMeansSummary
- pyspark.ml.clustering.BisectingKMeansSummary
- pyspark.ml.clustering.PowerIterationClustering
- pyspark.ml.stat.ChiSquareTest
- pyspark.ml.stat.KolmogorovSmirnovTest
- pyspark.ml.stat.Summarizer
- pyspark.ml.stat.SummaryBuilder
- pyspark.ml.tuning.TrainValidationSplit
- pyspark.ml.tuning.TrainValidationSplitModel
- pyspark.ml.evaluation.BinaryClassificationEvaluator
- pyspark.ml.evaluation.RegressionEvaluator
- pyspark.ml.evaluation.MulticlassClassificationEvaluator
- pyspark.ml.evaluation.ClusteringEvaluator
- The fitMultiple method in all the classes
- pyspark.rdd.countApprox
- pyspark.rdd.sumApprox
- pyspark.rdd.meanApprox
- pyspark.rdd.countApproxDistinct
- pyspark.sql.functions.pandas_udf
- pyspark.sql.GroupedData
- pyspark.sql.Window
- pyspark.sql.WindowSpec
- pyspark.taskcontext
- org.apache.spark.sql.Encoder
- org.apache.spark.sql.expressions.javalang.typed
- org.apache.spark.sql.Dataset.as
- org.apache.spark.sql.Dataset.checkpoint
- org.apache.spark.sql.Dataset.localCheckpoint
- org.apache.spark.sql.Dataset.withWatermark
- org.apache.spark.sql.Dataset.joinWith
- org.apache.spark.sql.Dataset.select
- org.apache.spark.sql.Dataset.reduce
- org.apache.spark.sql.Dataset.groupByKey
- org.apache.spark.sql.Dataset.filter
- org.apache.spark.sql.Dataset.map
- org.apache.spark.sql.Dataset.mapPartitions
- org.apache.spark.sql.Dataset.flatMap
- org.apache.spark.sql.KeyValueGroupedDataset.mapGroupsWithState
- org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState
- org.apache.spark.sql.SQLContext.listenerManager
- org.apache.spark.sql.SQLContext.implicits
- org.apache.spark.sql.SQLContext.createDataFrame
- org.apache.spark.sql.SQLContext.createDataset
- org.apache.spark.sql.SQLContext.range
- org.apache.spark.sql.SparkSession.listenerManager
- org.apache.spark.sql.SparkSession.streams
- org.apache.spark.sql.SparkSession.emptyDataset
- org.apache.spark.sql.SparkSession.createDataFrame
- org.apache.spark.sql.SparkSession.createDataset
- org.apache.spark.sql.SparkSession.range
- org.apache.spark.sql.SparkSession.implicits
- org.apache.spark.sql.catalog.createTable
- org.apache.spark.sql.expressions.Aggregator
- org.apache.spark.sql.internal.BaseSessionStateBuilder
- org.apache.spark.sql.internal.createTable
- org.apache.spark.sql.internal.SessionStateBuilder
- org.apache.spark.sql.jdbc.JdbcDialects
- org.apache.spark.sql.jdbc.JdbcType
- org.apache.spark.sql.sources.StreamSourceProvider
- org.apache.spark.sql.sources.StreamSinkProvider
- org.apache.spark.sql.sources.CatalystScan
- org.apache.spark.sql.util.QueryExecutionListener
- org.apache.spark.sql.util.ExecutionListenerManager
- org.apache.spark.sql.hive.HiveSessionStateBuilder
- org.apache.spark.streaming.StreamingContext.getActive
- org.apache.spark.streaming.StreamingContext.getActiveOrCreate
- org.apache.spark.streaming.api.java.JavaMapWithStateDStream
- org.apache.spark.streaming.api.java.mapWithState
- org.apache.spark.streaming.dstream.MapWithStateDStream
- org.apache.spark.streaming.dstream.PairDStreamFunctions.mapWithState
Removal of Evolving:
- org.apache.spark.streaming.kinesis.KinesisInputDStream
- org.apache.spark.streaming.kinesis.KinesisInputDStream.Builder
- org.apache.spark.streaming.kinesis.SparkAWSCredentials
- org.apache.spark.streaming.kinesis.SparkAWSCredentials.Builder
- org.apache.spark.sql.types.SQLUserDefinedType
- org.apache.spark.sql.vectorized.ArrowColumnVector
- org.apache.spark.sql.Encoder
- org.apache.spark.sql.types.ObjectType
- org.apache.spark.sql.Dataset.as
- org.apache.spark.sql.Dataset.isStreaming
- org.apache.spark.sql.Dataset.checkpoint
- org.apache.spark.sql.Dataset.localCheckpoint
- org.apache.spark.sql.Dataset.withWatermark
- org.apache.spark.sql.Dataset.joinWith
- org.apache.spark.sql.Dataset.select
- org.apache.spark.sql.Dataset.reduce
- org.apache.spark.sql.Dataset.groupByKey
- org.apache.spark.sql.Dataset.filter
- org.apache.spark.sql.Dataset.map
- org.apache.spark.sql.Dataset.mapPartitions
- org.apache.spark.sql.Dataset.flatMap
- org.apache.spark.sql.Dataset.writeStream
- org.apache.spark.sql.ForeachWriter
- org.apache.spark.sql.KeyValueGroupedDataset
- org.apache.spark.sql.KeyValueGroupedDataset.mapGroupsWithState
- org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState
- org.apache.spark.sql.SQLContext.listenerManager
- org.apache.spark.sql.SQLContext.implicits
- org.apache.spark.sql.SQLContext.createDataFrame
- org.apache.spark.sql.SQLContext.createDataset
- org.apache.spark.sql.SQLContext.readStream
- org.apache.spark.sql.SQLContext.range
- org.apache.spark.sql.SQLImplicits
- org.apache.spark.sql.SparkSession.listenerManager
- org.apache.spark.sql.SparkSession.emptyDataset
- org.apache.spark.sql.SparkSession.createDataFrame
- org.apache.spark.sql.SparkSession.createDataset
- org.apache.spark.sql.SparkSession.range
- org.apache.spark.sql.SparkSession.readStream
- org.apache.spark.sql.SparkSession.implicits
- org.apache.spark.sql.util.QueryExecutionListener
- org.apache.spark.sql.util.ExecutionListenerManager
- org.apache.spark.sql.expressions.Aggregator
- org.apache.spark.sql.catalog.createTable

srowen · 2019-09-23T17:24:23Z

It looks about right at a glance. I spot-checked many of them and they look correct, nothing was missing. If you did the hard work to list all of them out I take your word for it. (There weren't other pull requests besides this one)

srowen requested a review from dongjoon-hyun August 22, 2019 14:11

srowen self-assigned this Aug 22, 2019

mgaido91 reviewed Aug 22, 2019

View reviewed changes

tgravescs reviewed Aug 22, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala Outdated

/**

* :: Experimental ::

* :: ::

Copy link

Contributor

tgravescs Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line?

dongjoon-hyun added ML SPARK CORE SQL STRUCTURED STREAMING labels Aug 22, 2019

dongjoon-hyun reviewed Aug 22, 2019

View reviewed changes

srowen mentioned this pull request Aug 22, 2019

[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests #25559

Closed

cloud-fan reviewed Aug 23, 2019

View reviewed changes

srowen added 2 commits August 26, 2019 10:38

Remove outdated usages of Experimental, Evolving annotations

cf9a54e

Restore @unstable

8615ce3

srowen force-pushed the SPARK-28855 branch from 93305d2 to 8615ce3 Compare August 26, 2019 15:42

srowen added 2 commits August 27, 2019 09:47

Revert changes to structured streaming annotations

4dbed45

Revert more

6e9bb61

srowen closed this in eb037a8 Sep 1, 2019

srowen deleted the SPARK-28855 branch September 3, 2019 20:12

HeartSaVioR mentioned this pull request Dec 30, 2019

[SPARK-29348][SQL] Add observable Metrics for Streaming queries #26127

Closed

[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations #25558

[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations #25558

Uh oh!

Conversation

srowen commented Aug 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgravescs commented Aug 22, 2019

Uh oh!

SparkQA commented Aug 22, 2019

Uh oh!

SparkQA commented Aug 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Aug 22, 2019

Uh oh!

SparkQA commented Aug 22, 2019

Uh oh!

srowen commented Aug 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 23, 2019

Uh oh!

srowen commented Aug 23, 2019

Uh oh!

SparkQA commented Aug 26, 2019

Uh oh!

srowen commented Aug 26, 2019

Uh oh!

zsxwing commented Aug 26, 2019

Uh oh!

srowen commented Aug 26, 2019

Uh oh!

zsxwing commented Aug 26, 2019

Uh oh!

srowen commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 31, 2019

Uh oh!

SparkQA commented Aug 31, 2019

Uh oh!

SparkQA commented Sep 1, 2019

Uh oh!

srowen commented Sep 1, 2019

Uh oh!

dongjoon-hyun commented Sep 1, 2019

srowen commented Aug 22, 2019 •

edited

Loading