Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Aug 22, 2019

What changes were proposed in this pull request?

The Experimental and Evolving annotations are both (like Unstable) used to express that a an API may change. However there are many things in the code that have been marked that way since even Spark 1.x. Per the dev@ thread, anything introduced at or before Spark 2.3.0 is pretty much 'stable' in that it would not change without a deprecation cycle. Therefore I'd like to remove most of these annotations. And, remove the :: Experimental :: scaladoc tag too. And likewise for Python, R.

The changes below can be summarized as:

  • Generally, anything introduced at or before Spark 2.3.0 has been unmarked as neither Evolving nor Experimental
  • Obviously experimental items like DSv2, Barrier mode, ExperimentalMethods are untouched
  • I did unmark a few MLlib classes introduced in 2.4, as I am quite confident they're not going to change (e.g. KolmogorovSmirnovTest, PowerIterationClustering)

It's a big change to review, so I'd suggest scanning the list of files changed to see if any area seems like it should remain partly experimental and examine those.

Why are the changes needed?

Many of these annotations are incorrect; the APIs are de facto stable. Leaving them also makes legitimate usages of the annotations less meaningful.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

@srowen srowen requested a review from dongjoon-hyun August 22, 2019 14:11
@srowen srowen self-assigned this Aug 22, 2019
* @since 1.3.0
*/
@Experimental
@Unstable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to have it Unstable? It has not changed for over 2 years...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I wasn't sure about removing Unstable, but seems like some that clearly can go. Maybe anything not changed in a long time as you say.

* Concrete implementation of a [[BaseSessionStateBuilder]].
*/
@Experimental
@Unstable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not sure it make sense for this to be unstable..

* about the resource. Please refer to [[org.apache.spark.resource.ResourceInformation]] for
* specifics.
*/
@Evolving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just added in 3.0, it should be left.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops right, I see it's part of GPU resources

* batch so that Spark can access the data row by row. Instance of it is meant to be reused during
* the entire data loading process.
*/
@Evolving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the new changes to Columnar introduced in 3.0 I wonder if we shouldn't leave this evolving.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert the changes to Columnar*


/**
* :: Experimental ::
* :: ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line?

@tgravescs
Copy link
Contributor

those changes LGTM, I only skimmed mostly and looked for ones I'm familiar with.

@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109576 has finished for PR 25558 at commit 50bfcd6.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109587 has finished for PR 25558 at commit c4732c9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

* @since 2.4.0
*/
@Experimental
@deprecated("Please use 'org.apache.spark.sql.avro.functions.from_avro' instead.", "3.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is added at 2.4.0, this removal looks okay because this API is already deprecated.

cc @gengliangwang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with removing the annotation here.


import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.annotation._
import org.apache.spark.annotation.{DeveloperApi, Experimental, Stable, Unstable}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is the only remaining instance def experimental which is natural to be Experimental?

  /**
   * :: Experimental ::
   * A collection of methods that are considered experimental, but can be used to hook into
   * the query planner for advanced functionality.
   *
   * @group basic
   * @since 1.3.0
   */
  @Experimental
  @transient
  @Unstable
  def experimental: ExperimentalMethods = sparkSession.experimental

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much. This one looks inherently experimental, so, I left it.

* Concrete implementation of a [[BaseSessionStateBuilder]].
*/
@Experimental
@Unstable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SessionStateBuilder, we may want to keep @Unstable like HiveSessionStateBuilder.scala is @Unstable in this PR.

cc @gatorsmile , @cloud-fan , @hvanhovell

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if in doubt let's just leave it in, but, yeah I also don't know if there's doubt or what as this hasn't changed in a long while.

@dongjoon-hyun
Copy link
Member

Thank you so much, @srowen . This looks good to me.
The only concern is SessionStateBuilder class. For SessionStateBuilder, let's wait for more comments from @gatorsmile , @cloud-fan , @hvanhovell .

@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109589 has finished for PR 25558 at commit 93305d2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Aug 22, 2019

Uh oh, this might have uncovered some latent issue with the Python Kinesis tests:

======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...

I'll try to investigate separately.

* @since 1.3.0
*/
@Experimental
@Unstable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain unstable, as it uses private classes Attribute Expression.

@cloud-fan
Copy link
Contributor

I've not checked all of the changes, just one note: some interfaces are marked as unstable because they contain private classes. We should keep them unstable even if the interface itself hasn't been changed for years.

@srowen
Copy link
Member Author

srowen commented Aug 23, 2019

I'll put back the Unstable annotations. I'm first figuring out if the kinesis issues can be resolved.

@SparkQA
Copy link

SparkQA commented Aug 26, 2019

Test build #109744 has finished for PR 25558 at commit 8615ce3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Aug 26, 2019

OK, this is always going to fail right now as it triggers Kinesis tests, and there is some existing issue with Kinesis tests in Pyspark, which I am trying to figure out at #25559 (see also #24651). This should be an entirely non-functional change, so I'd like to merge it anyway as all of the build parts pass, and even the JVM Kinesis tests pass.

@zsxwing
Copy link
Member

zsxwing commented Aug 26, 2019

I think most of Structured Streaming APIs in the master branch are labeled correctly. Could you revert changes of these APIs? We are waiting for DS v2 to finalize the Structured Streaming APIs, and don't want to promise a stable DS v1 APIs.

@srowen
Copy link
Member Author

srowen commented Aug 26, 2019

@zsxwing would you say revert the changes to all of org.apache.sql.streaming.*? how about the Kinesis and DStream APIs? yes we should revert these if there is any doubt, just want to make sure I don't revert things you don't have in mind.

@zsxwing
Copy link
Member

zsxwing commented Aug 26, 2019

would you say revert the changes to all of org.apache.sql.streaming.*?

Yep.

how about the Kinesis and DStream APIs?

I don't have a strong option here. We unlikely change them in future.

@srowen
Copy link
Member Author

srowen commented Aug 27, 2019

@zsxwing have another look. I reverted a lot of the changes to streaming. Let me know if I missed an API that should remain experimental.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109820 has finished for PR 25558 at commit 6e9bb61.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

srowen added a commit that referenced this pull request Aug 31, 2019
… that breaks Pyspark Kinesis tests

The Pyspark Kinesis tests are failing, at least in master:

```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See #25558 (comment)

Closes #25559 from srowen/KinesisTest.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
srowen added a commit that referenced this pull request Aug 31, 2019
… that breaks Pyspark Kinesis tests

The Pyspark Kinesis tests are failing, at least in master:

```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See #25558 (comment)

Closes #25559 from srowen/KinesisTest.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit d5b7eed)
Signed-off-by: Sean Owen <[email protected]>
@SparkQA
Copy link

SparkQA commented Aug 31, 2019

Test build #4850 has finished for PR 25558 at commit 6e9bb61.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2019

Test build #4853 has finished for PR 25558 at commit 6e9bb61.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2019

Test build #4854 has finished for PR 25558 at commit 6e9bb61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in eb037a8 Sep 1, 2019
@srowen
Copy link
Member Author

srowen commented Sep 1, 2019

Merged to master

@dongjoon-hyun
Copy link
Member

Thank you so much, @srowen !

@srowen srowen deleted the SPARK-28855 branch September 3, 2019 20:12
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
… that breaks Pyspark Kinesis tests

The Pyspark Kinesis tests are failing, at least in master:

```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See apache#25558 (comment)

Closes apache#25559 from srowen/KinesisTest.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit d5b7eed)
Signed-off-by: Sean Owen <[email protected]>
@gatorsmile
Copy link
Member

I went over the PR and below is the list to summarize the changes. Could you confirm whether it is complete?

  • Removal of Experimental:

    • org.apache.spark.metrics.source.CodegenMetrics
    • org.apache.spark.metrics.source.HiveCatalogMetrics
    • org.apache.spark.partial
    • org.apache.spark.rdd.combineByKeyWithClassTag
    • org.apache.spark.rdd.combineByKeyWithClassTag
    • org.apache.spark.rdd.combineByKeyWithClassTag
    • org.apache.spark.sql.from_avro [deprecated]
    • org.apache.spark.sql.to_avro [deprecated]
    • org.apache.spark.ml.classification.LinearSVC
    • org.apache.spark.ml.classification.LinearSVCModel
    • org.apache.spark.ml.classification.LogisticRegressionSummary
    • org.apache.spark.ml.classification.LogisticRegressionTrainingSummary
    • org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
    • org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary
    • org.apache.spark.ml.clustering.BisectingKMeansSummary
    • org.apache.spark.ml.clustering.ClusteringSummary
    • org.apache.spark.ml.clustering.GaussianMixtureSummary
    • org.apache.hadoop.fs.Path.KMeansSummary
    • org.apache.spark.ml.clustering.PowerIterationClustering
    • org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
    • org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
    • org.apache.spark.ml.evaluation.RegressionEvaluator
    • org.apache.spark.ml.feature.BucketedRandomProjectionLSHParams
    • org.apache.spark.ml.feature.BucketedRandomProjectionLSH
    • org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
    • org.apache.spark.ml.feature.FeatureHasher
    • org.apache.spark.ml.feature.Imputer
    • org.apache.spark.ml.feature.ImputerModel
    • org.apache.spark.ml.feature.MinHashLSHModel
    • org.apache.spark.ml.feature.MinHashLSH
    • org.apache.spark.ml.feature.RFormula
    • org.apache.spark.ml.feature.RFormulaModel
    • org.apache.spark.ml.feature.VectorSizeHint
    • org.apache.spark.ml.fpm.FPGrowth
    • org.apache.spark.ml.fpm.FPGrowthModel
    • org.apache.spark.ml.fpm.PrefixSpan
    • org.apache.spark.ml.fpm.PrefixSpan.findFrequentSequentialPatterns
    • org.apache.spark.ml.image.ImageSchema
    • org.apache.spark.ml.regression.AFTSurvivalRegression
    • org.apache.spark.ml.regression.AFTSurvivalRegressionModel
    • org.apache.spark.ml.regression.GeneralizedLinearRegression
    • org.apache.spark.ml.regression.GeneralizedLinearRegressionModel
    • org.apache.spark.ml.regression.GeneralizedLinearRegressionSummary
    • org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary
    • org.apache.spark.ml.regression.LinearRegressionTrainingSummary
    • org.apache.spark.ml.regression.LinearRegressionSummary
    • org.apache.spark.ml.stat.ChiSquareTest
    • org.apache.spark.ml.stat.Correlation
    • org.apache.spark.ml.stat.Correlation.corr
    • org.apache.spark.ml.stat.KolmogorovSmirnovTest
    • org.apache.spark.ml.stat.SummaryBuilder
    • org.apache.spark.ml.stat.Summarizer
    • org.apache.spark.ml.util.PMMLUtils.loadFromString
    • pyspark.context.binaryFiles
    • pyspark.context.binaryRecords
    • pyspark.ml.fpm.FPGrowth
    • pyspark.ml.fpm.FPGrowthModel
    • pyspark.ml.fpm.PrefixSpan
    • pyspark.ml.feature.BucketedRandomProjectionLSH
    • pyspark.ml.feature.BucketedRandomProjectionLSHModel
    • pyspark.ml.feature.ChiSqSelector
    • pyspark.ml.feature.ChiSqSelectorModel
    • pyspark.ml.feature.FeatureHasher
    • pyspark.ml.feature.Imputer
    • pyspark.ml.feature.ImputerModel
    • pyspark.ml.feature.MinHashLSH
    • pyspark.ml.feature.MinHashLSHModel
    • pyspark.ml.feature.QuantileDiscretizer
    • pyspark.ml.feature.RFormula
    • pyspark.ml.feature.RFormulaModel
    • pyspark.ml.feature.VectorSizeHint
    • pyspark.ml.classification.LinearSVC
    • pyspark.ml.classification.LinearSVCModel
    • pyspark.ml.classification.LogisticRegressionSummary
    • pyspark.ml.classification.LogisticRegressionTrainingSummary
    • pyspark.ml.classification.BinaryLogisticRegressionSummary
    • pyspark.ml.classification.OneVsRest
    • pyspark.ml.classification.OneVsRestModel
    • pyspark.ml.clustering.GaussianMixtureSummary
    • pyspark.ml.clustering.KMeansSummary
    • pyspark.ml.clustering.BisectingKMeansSummary
    • pyspark.ml.clustering.PowerIterationClustering
    • pyspark.ml.stat.ChiSquareTest
    • pyspark.ml.stat.KolmogorovSmirnovTest
    • pyspark.ml.stat.Summarizer
    • pyspark.ml.stat.SummaryBuilder
    • pyspark.ml.tuning.TrainValidationSplit
    • pyspark.ml.tuning.TrainValidationSplitModel
    • pyspark.ml.evaluation.BinaryClassificationEvaluator
    • pyspark.ml.evaluation.RegressionEvaluator
    • pyspark.ml.evaluation.MulticlassClassificationEvaluator
    • pyspark.ml.evaluation.ClusteringEvaluator
    • The fitMultiple method in all the classes
    • pyspark.rdd.countApprox
    • pyspark.rdd.sumApprox
    • pyspark.rdd.meanApprox
    • pyspark.rdd.countApproxDistinct
    • pyspark.sql.functions.pandas_udf
    • pyspark.sql.GroupedData
    • pyspark.sql.Window
    • pyspark.sql.WindowSpec
    • pyspark.taskcontext
    • org.apache.spark.sql.Encoder
    • org.apache.spark.sql.expressions.javalang.typed
    • org.apache.spark.sql.Dataset.as
    • org.apache.spark.sql.Dataset.checkpoint
    • org.apache.spark.sql.Dataset.localCheckpoint
    • org.apache.spark.sql.Dataset.withWatermark
    • org.apache.spark.sql.Dataset.joinWith
    • org.apache.spark.sql.Dataset.select
    • org.apache.spark.sql.Dataset.reduce
    • org.apache.spark.sql.Dataset.groupByKey
    • org.apache.spark.sql.Dataset.filter
    • org.apache.spark.sql.Dataset.map
    • org.apache.spark.sql.Dataset.mapPartitions
    • org.apache.spark.sql.Dataset.flatMap
    • org.apache.spark.sql.KeyValueGroupedDataset.mapGroupsWithState
    • org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState
    • org.apache.spark.sql.SQLContext.listenerManager
    • org.apache.spark.sql.SQLContext.implicits
    • org.apache.spark.sql.SQLContext.createDataFrame
    • org.apache.spark.sql.SQLContext.createDataset
    • org.apache.spark.sql.SQLContext.range
    • org.apache.spark.sql.SparkSession.listenerManager
    • org.apache.spark.sql.SparkSession.streams
    • org.apache.spark.sql.SparkSession.emptyDataset
    • org.apache.spark.sql.SparkSession.createDataFrame
    • org.apache.spark.sql.SparkSession.createDataset
    • org.apache.spark.sql.SparkSession.range
    • org.apache.spark.sql.SparkSession.implicits
    • org.apache.spark.sql.catalog.createTable
    • org.apache.spark.sql.expressions.Aggregator
    • org.apache.spark.sql.internal.BaseSessionStateBuilder
    • org.apache.spark.sql.internal.createTable
    • org.apache.spark.sql.internal.SessionStateBuilder
    • org.apache.spark.sql.jdbc.JdbcDialects
    • org.apache.spark.sql.jdbc.JdbcType
    • org.apache.spark.sql.sources.StreamSourceProvider
    • org.apache.spark.sql.sources.StreamSinkProvider
    • org.apache.spark.sql.sources.CatalystScan
    • org.apache.spark.sql.util.QueryExecutionListener
    • org.apache.spark.sql.util.ExecutionListenerManager
    • org.apache.spark.sql.hive.HiveSessionStateBuilder
    • org.apache.spark.streaming.StreamingContext.getActive
    • org.apache.spark.streaming.StreamingContext.getActiveOrCreate
    • org.apache.spark.streaming.api.java.JavaMapWithStateDStream
    • org.apache.spark.streaming.api.java.mapWithState
    • org.apache.spark.streaming.dstream.MapWithStateDStream
    • org.apache.spark.streaming.dstream.PairDStreamFunctions.mapWithState
  • Removal of Evolving:

    • org.apache.spark.streaming.kinesis.KinesisInputDStream
    • org.apache.spark.streaming.kinesis.KinesisInputDStream.Builder
    • org.apache.spark.streaming.kinesis.SparkAWSCredentials
    • org.apache.spark.streaming.kinesis.SparkAWSCredentials.Builder
    • org.apache.spark.sql.types.SQLUserDefinedType
    • org.apache.spark.sql.vectorized.ArrowColumnVector
    • org.apache.spark.sql.Encoder
    • org.apache.spark.sql.types.ObjectType
    • org.apache.spark.sql.Dataset.as
    • org.apache.spark.sql.Dataset.isStreaming
    • org.apache.spark.sql.Dataset.checkpoint
    • org.apache.spark.sql.Dataset.localCheckpoint
    • org.apache.spark.sql.Dataset.withWatermark
    • org.apache.spark.sql.Dataset.joinWith
    • org.apache.spark.sql.Dataset.select
    • org.apache.spark.sql.Dataset.reduce
    • org.apache.spark.sql.Dataset.groupByKey
    • org.apache.spark.sql.Dataset.filter
    • org.apache.spark.sql.Dataset.map
    • org.apache.spark.sql.Dataset.mapPartitions
    • org.apache.spark.sql.Dataset.flatMap
    • org.apache.spark.sql.Dataset.writeStream
    • org.apache.spark.sql.ForeachWriter
    • org.apache.spark.sql.KeyValueGroupedDataset
    • org.apache.spark.sql.KeyValueGroupedDataset.mapGroupsWithState
    • org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState
    • org.apache.spark.sql.SQLContext.listenerManager
    • org.apache.spark.sql.SQLContext.implicits
    • org.apache.spark.sql.SQLContext.createDataFrame
    • org.apache.spark.sql.SQLContext.createDataset
    • org.apache.spark.sql.SQLContext.readStream
    • org.apache.spark.sql.SQLContext.range
    • org.apache.spark.sql.SQLImplicits
    • org.apache.spark.sql.SparkSession.listenerManager
    • org.apache.spark.sql.SparkSession.emptyDataset
    • org.apache.spark.sql.SparkSession.createDataFrame
    • org.apache.spark.sql.SparkSession.createDataset
    • org.apache.spark.sql.SparkSession.range
    • org.apache.spark.sql.SparkSession.readStream
    • org.apache.spark.sql.SparkSession.implicits
    • org.apache.spark.sql.util.QueryExecutionListener
    • org.apache.spark.sql.util.ExecutionListenerManager
    • org.apache.spark.sql.expressions.Aggregator
    • org.apache.spark.sql.catalog.createTable

@srowen
Copy link
Member Author

srowen commented Sep 23, 2019

It looks about right at a glance. I spot-checked many of them and they look correct, nothing was missing. If you did the hard work to list all of them out I take your word for it. (There weren't other pull requests besides this one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants