Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Jun 28, 2019

What changes were proposed in this pull request?

This patch proposes moving all Trigger implementations to Triggers.scala, to avoid exposing these implementations to the end users and let end users only deal with Trigger.xxx static methods. This fits the intention of deprecation of ProcessingTIme, and we agree to move others without deprecation as this patch will be shipped in major version (Spark 3.0.0).

How was this patch tested?

UTs modified to work with newly introduced class.

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #106991 has finished for PR 24996 at commit 20059fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ProcessingTimeExecutor(

@srowen
Copy link
Member

srowen commented Jun 30, 2019

Why introduce a new abstraction? This is what Trigger.ProcessingTime is meant to be. Just move the implementation to that class, and have the deprecated impl use it, rather than the other way around?

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jun 30, 2019

Trigger is just an exposing user side API, and implementation has been placed on Scala side like how ProcessingTime is implemented. Once and Continuous followed the same way.

That said, if we would like to proceed smoother migration with respecting the pattern what Spark has been doing, we need a new class which takes the role instead.

If we think we can remove ProcessingTime, we may not need this smoother but redundant approach. Just change case class and object of ProcessingTime to private[sql], and remove deprecation.

@srowen
Copy link
Member

srowen commented Jul 1, 2019

Because ProcessingTime was deprecated in 2.2.0, I actually think it's fine and good to remove it for 3.0. Does this get simpler if you can just remove the old implementation and only use the new one? That's probably what should happen here anyway.

@HeartSaVioR
Copy link
Contributor Author

When we decide to discontinue supporting ProcessingTime, looks like we have two choices:

  1. Just change the scope to restrict users' access. Existing queries may fail but it is what we expected. Least change required, but this way someone may try to hack and continue using it. Not sure we are concerning that hacky approach, though.

  2. Remove the old implementation and only use the new one. Pretty clear to represent we don't support it anymore since no one could try to hack, as I'd rather not think someone applies hack to new class. More change required but simpler than current diff, maybe.

Which option do you feel better?

@srowen
Copy link
Member

srowen commented Jul 1, 2019

I personally favor removing the old deprecated implementation. It's probably simpler, and 3.0 is the right time to do it. I don't think this is 'critical' to continue supporting as a deprecated API, compared to many other things we've removed for similar reasons.


import org.apache.spark.internal.Logging
import org.apache.spark.sql.streaming.ProcessingTime
import org.apache.spark.sql.streaming.{ProcessingTime, Trigger}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the ProcessingTime import?

* the query will run as fast as possible.
*/
@Evolving
private[sql] case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if this is basically an implementation class, I wonder if it belongs in the (unfortunately named) Triggers.scala class, which only now has OneTimeTrigger, but at least is just another implementation class too? No big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we decide to move this class to Triggers.scala, I guess ContinuousTrigger has to be moved too. No big deal for me too, so please let me know which feels cleaner for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd move it, to rationalize Triggers.scala and avoid another file. It doesn't matter much. I see there is ContinuousTrigger too but I guess it belongs in the .continuous subpackage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your voice on this. Agreed on both points, I'll leave ContinuousTrigger as it is, and move ProcessingTimeTrigger.

@SparkQA
Copy link

SparkQA commented Jul 1, 2019

Test build #107085 has finished for PR 24996 at commit 2acc53a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR HeartSaVioR changed the title [SPARK-28199][SS] Remove usage of deprecated ProcessingTime in Spark codebase [SPARK-28199][SS] Remove deprecated ProcessingTime Jul 1, 2019
@SparkQA
Copy link

SparkQA commented Jul 2, 2019

Test build #107087 has finished for PR 24996 at commit bc5a3fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Looks like everything sorted out. Please take a look again. Thanks!

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @tdas or possibly @jose-torres

@dongjoon-hyun
Copy link
Member

Retest this please.

*/
case class ProcessingTimeExecutor(processingTime: ProcessingTime, clock: Clock = new SystemClock())
case class ProcessingTimeExecutor(
processingTime: ProcessingTimeTrigger,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the variable together.

  • processingTime -> processingTimeTrigger.
  • private val intervalMs = processingTime.intervalMs -> private val intervalMs = processingTimeTrigger.intervalMs

@jose-torres
Copy link
Contributor

Yeah, I'm also in favor of removing the deprecated implementation.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the purpose of introducing a new case class name.
However, let's be clear. Technically, this PR is doing a kind of renaming and changing the visibility of the existing ProcessingTime. It would be great if we can mention that change explicitly in the PR title.

@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107194 has finished for PR 24996 at commit bc5a3fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 4, 2019

Let me explain my intention on the patch.

Actually I assume the reason we deprecated ProcessingTime is that we don't want to expose implementations to user side. (Maybe interop with Java was considered...) That is the only valid reason I can imagine and understand why they were deprecated and end users were encouraged to use static methods of Trigger instead. Not 100% sure.

If that's the reason, we should hide other trigger implementations as well (OneTimeTrigger, ContinuousTrigger) so this patch may need to also mark them as deprecated. (And we may eventually replace them with other classes like this patch proposes.)

I dug the history when it's marked as deprecated, and realize there's no explanation about the reason of deprecation. No description, no review comment on deprecating these methods. So unfortunately the reason is still not clear.

At least based on my assumption (and my intention of the patch), it is not just renaming and changing the visibility. It finally ends up looking like that, but in fact it "removes" the deprecated class, and create the other class with applying the intention of deprecation. I will update the title of PR to include introduce new class to replace ProcessTime.

If my assumption is incorrect, we may need to ask to ourselves why we deprecated these methods. Honestly I can't find any other reasons of doing that.

@HeartSaVioR
Copy link
Contributor Author

FYI, the PR which marked these methods as deprecated is here: #17219

@HeartSaVioR
Copy link
Contributor Author

cc. to @tcondie hopefully who might be able to explain the reason.
Also cc. @marmbrus @tdas who reviewed that patch.

We may need to just guess if we couldn't hear any information.

@HeartSaVioR HeartSaVioR changed the title [SPARK-28199][SS] Remove deprecated ProcessingTime [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users Jul 4, 2019
@SparkQA
Copy link

SparkQA commented Jul 4, 2019

Test build #107210 has finished for PR 24996 at commit 12655f0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 4, 2019

Thank you for update.

BTW, is the failure due to a flaky test case?

[info] - query without test harness *** FAILED *** (2 seconds, 931 milliseconds)
[info]   scala.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) was false
(ContinuousSuite.scala:226)

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 4, 2019

Looks like I could see same failure locally by on other branch (SPARK-27254). It fails intermittently, but even it succeeds it leaves suspicious error log. I'll see what is happening there.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jul 11, 2019

Test build #107504 has finished for PR 24996 at commit f42e3b5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jul 11, 2019

Test build #107539 has finished for PR 24996 at commit f42e3b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

@srowen @tdas Would you mind doing another round of review? Thanks in advance!

@srowen
Copy link
Member

srowen commented Jul 12, 2019

I think this is OK. We need to add releases notes. I don't think there's a streaming migration guide in the docs, so we can use Docs text in the JIRA. Is this accurate @HeartSaVioR ?

"In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead. Likewise, org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger has been removed in favor of Trigger.Continuous, and org.apache.spark.sql.execution.streaming.OneTimeTrigger has been hidden in favor of Trigger.Once."

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 12, 2019

In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead.

End users are always encouraged to use Trigger.xxx, so this case we need to guide "Use Trigger.ProcessTime". org.apache.spark.sql.streaming.Trigger.ProcessingTime is the new one we would like to hide from user.

I think remaining is accurate. Thanks for the nice summary!

@srowen
Copy link
Member

srowen commented Jul 12, 2019

Oh yes, I mean they should use the method Trigger.ProcessingTime(); that's not new. (The method's name is unfortunate.)

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 12, 2019

Yes I also confused with existing method and new class, you're right.

@srowen
Copy link
Member

srowen commented Jul 14, 2019

Merged to master

@srowen srowen closed this in 7548a88 Jul 14, 2019
@HeartSaVioR
Copy link
Contributor Author

Thanks all for the detailed review and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-28199 branch July 14, 2019 20:27
vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
…avoid exposing these to the end users

## What changes were proposed in this pull request?

This patch proposes moving all Trigger implementations to `Triggers.scala`, to avoid exposing these implementations to the end users and let end users only deal with `Trigger.xxx` static methods. This fits the intention of deprecation of `ProcessingTIme`, and we agree to move others without deprecation as this patch will be shipped in major version (Spark 3.0.0).

## How was this patch tested?

UTs modified to work with newly introduced class.

Closes apache#24996 from HeartSaVioR/SPARK-28199.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
yiheng pushed a commit to yiheng/spark that referenced this pull request Jul 24, 2019
…e API

## What changes were proposed in this pull request?

SPARK-28199 (apache#24996) hid implementations of Triggers into `private[sql]` and encourage end users to use `Trigger.xxx` methods instead.

As I got some post review comment on apache@7548a88#r34366934 we could remove annotations which are meant to be used with public API.

## How was this patch tested?

N/A

Closes apache#25200 from HeartSaVioR/SPARK-28199-FOLLOWUP.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@cloud-fan
Copy link
Contributor

shall we have an item in the migration guide for it?

@HeartSaVioR
Copy link
Contributor Author

Thanks for reminding. We didn't do as migration guide didn't exist for SS. I'll submit a PR quickly.

@srowen
Copy link
Member

srowen commented Jun 8, 2020

There's no harm in a migration guide note, I think, other than potentially overloading it. This is a case I would have thought release notes cover. What would you write in the migration guide? "use the new class"?

@HeartSaVioR
Copy link
Contributor Author

I guess same content with release note would be OK for migration guide - that's just a matter of preference of references. Suppose end users upgrade to Spark 3.0.0 and find their application fail to compile, which doc they would find for the first time? Migration guide looks to be the centralized one, so maybe preferred over release note.

I have a commit but yet submitted a PR. Please let me know if it makes sense to add it in migration guide.

* the query.
*/
@Experimental
@Evolving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove the annotations? it's private but the annotations say it's an API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. These classes are now not intended to expose so should remove annotations. Thanks for finding it out!

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... in reality that was done in #25200. Let's make sure we check the latest code (not the code diff) while doing post-hoc review after long delay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure. Thanks :D.

@Experimental
@Evolving
case object OneTimeTrigger extends Trigger
private[sql] case object OneTimeTrigger extends Trigger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, let's don't have private[sql] since execution package is already private per SPARK-16964

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will fix. The practice seems to be really easy to miss IMHO though.

cloud-fan pushed a commit that referenced this pull request Jun 9, 2020
…he SS migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Jun 9, 2020
…he SS migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 8305b77)
Signed-off-by: Wenchen Fan <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jun 11, 2020
… sql.execution package

### What changes were proposed in this pull request?

This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964.

### Why are the changes needed?

This is per post-hoc review comment, see #24996 (comment)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jun 11, 2020
… sql.execution package

### What changes were proposed in this pull request?

This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964.

### Why are the changes needed?

This is per post-hoc review comment, see #24996 (comment)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4afe2b1)
Signed-off-by: Dongjoon Hyun <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
… sql.execution package

### What changes were proposed in this pull request?

This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964.

### Why are the changes needed?

This is per post-hoc review comment, see apache#24996 (comment)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4afe2b1)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants