[SPARK-28199][SS] Move Trigger implementations to Triggers.scala and avoid exposing these to the end users #24996

HeartSaVioR · 2019-06-28T07:51:40Z

What changes were proposed in this pull request?

This patch proposes moving all Trigger implementations to Triggers.scala, to avoid exposing these implementations to the end users and let end users only deal with Trigger.xxx static methods. This fits the intention of deprecation of ProcessingTIme, and we agree to move others without deprecation as this patch will be shipped in major version (Spark 3.0.0).

How was this patch tested?

UTs modified to work with newly introduced class.

…codebase

SparkQA · 2019-06-28T10:52:23Z

Test build #106991 has finished for PR 24996 at commit 20059fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ProcessingTimeExecutor(

srowen · 2019-06-30T01:31:16Z

Why introduce a new abstraction? This is what Trigger.ProcessingTime is meant to be. Just move the implementation to that class, and have the deprecated impl use it, rather than the other way around?

HeartSaVioR · 2019-06-30T23:51:44Z

Trigger is just an exposing user side API, and implementation has been placed on Scala side like how ProcessingTime is implemented. Once and Continuous followed the same way.

That said, if we would like to proceed smoother migration with respecting the pattern what Spark has been doing, we need a new class which takes the role instead.

If we think we can remove ProcessingTime, we may not need this smoother but redundant approach. Just change case class and object of ProcessingTime to private[sql], and remove deprecation.

srowen · 2019-07-01T13:36:15Z

Because ProcessingTime was deprecated in 2.2.0, I actually think it's fine and good to remove it for 3.0. Does this get simpler if you can just remove the old implementation and only use the new one? That's probably what should happen here anyway.

HeartSaVioR · 2019-07-01T14:32:05Z

When we decide to discontinue supporting ProcessingTime, looks like we have two choices:

Just change the scope to restrict users' access. Existing queries may fail but it is what we expected. Least change required, but this way someone may try to hack and continue using it. Not sure we are concerning that hacky approach, though.
Remove the old implementation and only use the new one. Pretty clear to represent we don't support it anymore since no one could try to hack, as I'd rather not think someone applies hack to new class. More change required but simpler than current diff, maybe.

Which option do you feel better?

srowen · 2019-07-01T15:56:56Z

I personally favor removing the old deprecated implementation. It's probably simpler, and 3.0 is the right time to do it. I don't think this is 'critical' to continue supporting as a deprecated API, compared to many other things we've removed for similar reasons.

srowen · 2019-07-01T19:10:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala


 import org.apache.spark.internal.Logging
-import org.apache.spark.sql.streaming.ProcessingTime
+import org.apache.spark.sql.streaming.{ProcessingTime, Trigger}


Can you remove the ProcessingTime import?

...src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala

srowen · 2019-07-01T19:11:36Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeTrigger.scala

+ * the query will run as fast as possible.
+ */
+@Evolving
+private[sql] case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger {


I see, if this is basically an implementation class, I wonder if it belongs in the (unfortunately named) Triggers.scala class, which only now has OneTimeTrigger, but at least is just another implementation class too? No big deal.

Once we decide to move this class to Triggers.scala, I guess ContinuousTrigger has to be moved too. No big deal for me too, so please let me know which feels cleaner for you.

I think I'd move it, to rationalize Triggers.scala and avoid another file. It doesn't matter much. I see there is ContinuousTrigger too but I guess it belongs in the .continuous subpackage.

Thanks for your voice on this. Agreed on both points, I'll leave ContinuousTrigger as it is, and move ProcessingTimeTrigger.

…s.scala

SparkQA · 2019-07-01T21:56:43Z

Test build #107085 has finished for PR 24996 at commit 2acc53a.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-02T01:18:10Z

Test build #107087 has finished for PR 24996 at commit bc5a3fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-07-02T05:34:54Z

Looks like everything sorted out. Please take a look again. Thanks!

srowen

CC @tdas or possibly @jose-torres

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala

dongjoon-hyun · 2019-07-03T21:23:32Z

Retest this please.

dongjoon-hyun · 2019-07-03T21:29:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala

 */
-case class ProcessingTimeExecutor(processingTime: ProcessingTime, clock: Clock = new SystemClock())
+case class ProcessingTimeExecutor(
+    processingTime: ProcessingTimeTrigger,


Please rename the variable together.

processingTime -> processingTimeTrigger.

private val intervalMs = processingTime.intervalMs -> private val intervalMs = processingTimeTrigger.intervalMs

jose-torres · 2019-07-03T21:34:34Z

Yeah, I'm also in favor of removing the deprecated implementation.

dongjoon-hyun

I agree with the purpose of introducing a new case class name.
However, let's be clear. Technically, this PR is doing a kind of renaming and changing the visibility of the existing ProcessingTime. It would be great if we can mention that change explicitly in the PR title.

SparkQA · 2019-07-03T22:55:43Z

Test build #107194 has finished for PR 24996 at commit bc5a3fc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-07-04T02:13:25Z

Let me explain my intention on the patch.

Actually I assume the reason we deprecated ProcessingTime is that we don't want to expose implementations to user side. (Maybe interop with Java was considered...) That is the only valid reason I can imagine and understand why they were deprecated and end users were encouraged to use static methods of Trigger instead. Not 100% sure.

If that's the reason, we should hide other trigger implementations as well (OneTimeTrigger, ContinuousTrigger) so this patch may need to also mark them as deprecated. (And we may eventually replace them with other classes like this patch proposes.)

I dug the history when it's marked as deprecated, and realize there's no explanation about the reason of deprecation. No description, no review comment on deprecating these methods. So unfortunately the reason is still not clear.

At least based on my assumption (and my intention of the patch), it is not just renaming and changing the visibility. It finally ends up looking like that, but in fact it "removes" the deprecated class, and create the other class with applying the intention of deprecation. I will update the title of PR to include introduce new class to replace ProcessTime.

If my assumption is incorrect, we may need to ask to ourselves why we deprecated these methods. Honestly I can't find any other reasons of doing that.

HeartSaVioR · 2019-07-04T02:18:45Z

FYI, the PR which marked these methods as deprecated is here: #17219

HeartSaVioR · 2019-07-04T02:28:35Z

cc. to @tcondie hopefully who might be able to explain the reason.
Also cc. @marmbrus @tdas who reviewed that patch.

We may need to just guess if we couldn't hear any information.

SparkQA · 2019-07-04T03:57:07Z

Test build #107210 has finished for PR 24996 at commit 12655f0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-07-04T04:06:03Z

Thank you for update.

BTW, is the failure due to a flaky test case?

[info] - query without test harness *** FAILED *** (2 seconds, 931 milliseconds)
[info]   scala.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) was false
(ContinuousSuite.scala:226)

HeartSaVioR · 2019-07-04T04:32:58Z

Looks like I could see same failure locally by on other branch (SPARK-27254). It fails intermittently, but even it succeeds it leaves suspicious error log. I'll see what is happening there.

HeartSaVioR · 2019-07-04T04:33:14Z

retest this, please

SparkQA · 2019-07-11T07:05:02Z

Test build #107504 has finished for PR 24996 at commit f42e3b5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-07-11T13:20:25Z

retest this, please

SparkQA · 2019-07-11T15:48:17Z

Test build #107539 has finished for PR 24996 at commit f42e3b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-07-11T23:31:27Z

@srowen @tdas Would you mind doing another round of review? Thanks in advance!

srowen · 2019-07-12T15:36:20Z

I think this is OK. We need to add releases notes. I don't think there's a streaming migration guide in the docs, so we can use Docs text in the JIRA. Is this accurate @HeartSaVioR ?

"In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead. Likewise, org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger has been removed in favor of Trigger.Continuous, and org.apache.spark.sql.execution.streaming.OneTimeTrigger has been hidden in favor of Trigger.Once."

HeartSaVioR · 2019-07-12T16:12:24Z

In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead.

End users are always encouraged to use Trigger.xxx, so this case we need to guide "Use Trigger.ProcessTime". org.apache.spark.sql.streaming.Trigger.ProcessingTime is the new one we would like to hide from user.

I think remaining is accurate. Thanks for the nice summary!

srowen · 2019-07-12T16:20:45Z

Oh yes, I mean they should use the method Trigger.ProcessingTime(); that's not new. (The method's name is unfortunate.)

HeartSaVioR · 2019-07-12T16:22:27Z

Yes I also confused with existing method and new class, you're right.

srowen · 2019-07-14T19:46:08Z

Merged to master

HeartSaVioR · 2019-07-14T20:27:36Z

Thanks all for the detailed review and merging!

…avoid exposing these to the end users ## What changes were proposed in this pull request? This patch proposes moving all Trigger implementations to `Triggers.scala`, to avoid exposing these implementations to the end users and let end users only deal with `Trigger.xxx` static methods. This fits the intention of deprecation of `ProcessingTIme`, and we agree to move others without deprecation as this patch will be shipped in major version (Spark 3.0.0). ## How was this patch tested? UTs modified to work with newly introduced class. Closes apache#24996 from HeartSaVioR/SPARK-28199. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Sean Owen <[email protected]>

…e API ## What changes were proposed in this pull request? SPARK-28199 (apache#24996) hid implementations of Triggers into `private[sql]` and encourage end users to use `Trigger.xxx` methods instead. As I got some post review comment on apache@7548a88#r34366934 we could remove annotations which are meant to be used with public API. ## How was this patch tested? N/A Closes apache#25200 from HeartSaVioR/SPARK-28199-FOLLOWUP. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

cloud-fan · 2020-06-08T17:06:28Z

shall we have an item in the migration guide for it?

HeartSaVioR · 2020-06-08T22:32:00Z

Thanks for reminding. We didn't do as migration guide didn't exist for SS. I'll submit a PR quickly.

srowen · 2020-06-08T22:48:41Z

There's no harm in a migration guide note, I think, other than potentially overloading it. This is a case I would have thought release notes cover. What would you write in the migration guide? "use the new class"?

HeartSaVioR · 2020-06-08T22:58:15Z

I guess same content with release note would be OK for migration guide - that's just a matter of preference of references. Suppose end users upgrade to Spark 3.0.0 and find their application fail to compile, which doc they would find for the first time? Migration guide looks to be the centralized one, so maybe preferred over release note.

I have a commit but yet submitted a PR. Please let me know if it makes sense to add it in migration guide.

HyukjinKwon · 2020-06-09T04:07:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala

 * the query.
 */
 @Experimental
 @Evolving


Shall we remove the annotations? it's private but the annotations say it's an API.

Oh right. These classes are now not intended to expose so should remove annotations. Thanks for finding it out!

Well... in reality that was done in #25200. Let's make sure we check the latest code (not the code diff) while doing post-hoc review after long delay.

Ah, sure. Thanks :D.

HyukjinKwon · 2020-06-09T04:09:22Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala

 @Experimental
 @Evolving
-case object OneTimeTrigger extends Trigger
+private[sql] case object OneTimeTrigger extends Trigger


Also, let's don't have private[sql] since execution package is already private per SPARK-16964

OK will fix. The practice seems to be really easy to miss IMHO though.

…he SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…he SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 8305b77) Signed-off-by: Wenchen Fan <[email protected]>

… sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see #24996 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

… sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see #24996 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4afe2b1) Signed-off-by: Dongjoon Hyun <[email protected]>

… sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see apache#24996 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes apache#28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4afe2b1) Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-28199][SS] Remove usage of deprecated ProcessingTime in Spark …

20059fa

…codebase

dongjoon-hyun added the STRUCTURED STREAMING label Jun 28, 2019

srowen reviewed Jul 1, 2019

View reviewed changes

Remove ProcessingTime entirely, move ProcessingTimeTrigger to Trigger…

2acc53a

…s.scala

HeartSaVioR changed the title ~~[SPARK-28199][SS] Remove usage of deprecated ProcessingTime in Spark codebase~~ [SPARK-28199][SS] Remove deprecated ProcessingTime Jul 1, 2019

Fix Mima

bc5a3fc

srowen approved these changes Jul 2, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala Show resolved Hide resolved

dongjoon-hyun reviewed Jul 3, 2019

View reviewed changes

dongjoon-hyun requested changes Jul 3, 2019

View reviewed changes

HeartSaVioR changed the title ~~[SPARK-28199][SS] Remove deprecated ProcessingTime~~ [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users Jul 4, 2019

nit: rename variable to follow the type

12655f0

srowen closed this in 7548a88 Jul 14, 2019

HeartSaVioR deleted the SPARK-28199 branch July 14, 2019 20:27

HeartSaVioR mentioned this pull request Jul 19, 2019

[SPARK-28199][SS][FOLLOWUP] Remove unnecessary annotations for private API #25200

Closed

HeartSaVioR mentioned this pull request Jun 9, 2020

[SPARK-28199][SS][FOLLOWUP] Mention the change of SPARK-28199 into the SS migration guide #28763

Closed

HyukjinKwon reviewed Jun 9, 2020

View reviewed changes

HeartSaVioR mentioned this pull request Jun 10, 2020

[SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package #28790

Closed

[SPARK-28199][SS] Move Trigger implementations to Triggers.scala and avoid exposing these to the end users #24996

[SPARK-28199][SS] Move Trigger implementations to Triggers.scala and avoid exposing these to the end users #24996

Uh oh!

Conversation

HeartSaVioR commented Jun 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 28, 2019

Uh oh!

srowen commented Jun 30, 2019

Uh oh!

HeartSaVioR commented Jun 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Jul 1, 2019

Uh oh!

HeartSaVioR commented Jul 1, 2019

Uh oh!

srowen commented Jul 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 1, 2019

Uh oh!

SparkQA commented Jul 2, 2019

Uh oh!

HeartSaVioR commented Jul 2, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun commented Jul 3, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jose-torres commented Jul 3, 2019

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 3, 2019

Uh oh!

HeartSaVioR commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jul 4, 2019

Uh oh!

HeartSaVioR commented Jul 4, 2019

Uh oh!

SparkQA commented Jul 4, 2019

Uh oh!

dongjoon-hyun commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jul 4, 2019

Uh oh!

SparkQA commented Jul 11, 2019

Uh oh!

HeartSaVioR commented Jul 11, 2019

Uh oh!

SparkQA commented Jul 11, 2019

Uh oh!

HeartSaVioR commented Jul 11, 2019

Uh oh!

HeartSaVioR commented Jun 28, 2019 •

edited

Loading

HeartSaVioR commented Jun 30, 2019 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

HeartSaVioR commented Jul 4, 2019 •

edited

Loading

dongjoon-hyun commented Jul 4, 2019 •

edited

Loading

HeartSaVioR commented Jul 4, 2019 •

edited

Loading

HeartSaVioR commented Jul 12, 2019 •

edited

Loading

HeartSaVioR commented Jul 12, 2019 •

edited

Loading

HeartSaVioR Jun 10, 2020 •

edited

Loading