-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28199][SS] Move Trigger implementations to Triggers.scala and avoid exposing these to the end users #24996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #106991 has finished for PR 24996 at commit
|
|
Why introduce a new abstraction? This is what |
|
Trigger is just an exposing user side API, and implementation has been placed on Scala side like how That said, if we would like to proceed smoother migration with respecting the pattern what Spark has been doing, we need a new class which takes the role instead. If we think we can remove ProcessingTime, we may not need this smoother but redundant approach. Just change case class and object of ProcessingTime to |
|
Because |
|
When we decide to discontinue supporting
Which option do you feel better? |
|
I personally favor removing the old deprecated implementation. It's probably simpler, and 3.0 is the right time to do it. I don't think this is 'critical' to continue supporting as a deprecated API, compared to many other things we've removed for similar reasons. |
|
|
||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.sql.streaming.ProcessingTime | ||
| import org.apache.spark.sql.streaming.{ProcessingTime, Trigger} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the ProcessingTime import?
...src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
Show resolved
Hide resolved
| * the query will run as fast as possible. | ||
| */ | ||
| @Evolving | ||
| private[sql] case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, if this is basically an implementation class, I wonder if it belongs in the (unfortunately named) Triggers.scala class, which only now has OneTimeTrigger, but at least is just another implementation class too? No big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we decide to move this class to Triggers.scala, I guess ContinuousTrigger has to be moved too. No big deal for me too, so please let me know which feels cleaner for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd move it, to rationalize Triggers.scala and avoid another file. It doesn't matter much. I see there is ContinuousTrigger too but I guess it belongs in the .continuous subpackage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your voice on this. Agreed on both points, I'll leave ContinuousTrigger as it is, and move ProcessingTimeTrigger.
|
Test build #107085 has finished for PR 24996 at commit
|
|
Test build #107087 has finished for PR 24996 at commit
|
|
Looks like everything sorted out. Please take a look again. Thanks! |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @tdas or possibly @jose-torres
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala
Show resolved
Hide resolved
|
Retest this please. |
| */ | ||
| case class ProcessingTimeExecutor(processingTime: ProcessingTime, clock: Clock = new SystemClock()) | ||
| case class ProcessingTimeExecutor( | ||
| processingTime: ProcessingTimeTrigger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename the variable together.
processingTime->processingTimeTrigger.private val intervalMs = processingTime.intervalMs->private val intervalMs = processingTimeTrigger.intervalMs
|
Yeah, I'm also in favor of removing the deprecated implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the purpose of introducing a new case class name.
However, let's be clear. Technically, this PR is doing a kind of renaming and changing the visibility of the existing ProcessingTime. It would be great if we can mention that change explicitly in the PR title.
|
Test build #107194 has finished for PR 24996 at commit
|
|
Let me explain my intention on the patch. Actually I assume the reason we deprecated If that's the reason, we should hide other trigger implementations as well (OneTimeTrigger, ContinuousTrigger) so this patch may need to also mark them as deprecated. (And we may eventually replace them with other classes like this patch proposes.) I dug the history when it's marked as deprecated, and realize there's no explanation about the reason of deprecation. No description, no review comment on deprecating these methods. So unfortunately the reason is still not clear. At least based on my assumption (and my intention of the patch), it is not just renaming and changing the visibility. It finally ends up looking like that, but in fact it "removes" the deprecated class, and create the other class with applying the intention of deprecation. I will update the title of PR to include If my assumption is incorrect, we may need to ask to ourselves why we deprecated these methods. Honestly I can't find any other reasons of doing that. |
|
FYI, the PR which marked these methods as deprecated is here: #17219 |
|
Test build #107210 has finished for PR 24996 at commit
|
|
Thank you for update. BTW, is the failure due to a flaky test case? |
|
Looks like I could see same failure locally by on other branch (SPARK-27254). It fails intermittently, but even it succeeds it leaves suspicious error log. I'll see what is happening there. |
|
retest this, please |
|
Test build #107504 has finished for PR 24996 at commit
|
|
retest this, please |
|
Test build #107539 has finished for PR 24996 at commit
|
|
I think this is OK. We need to add releases notes. I don't think there's a streaming migration guide in the docs, so we can use Docs text in the JIRA. Is this accurate @HeartSaVioR ? "In Spark 3.0, the deprecated class |
End users are always encouraged to use I think remaining is accurate. Thanks for the nice summary! |
|
Oh yes, I mean they should use the method |
|
Yes I also confused with existing method and new class, you're right. |
|
Merged to master |
|
Thanks all for the detailed review and merging! |
…avoid exposing these to the end users ## What changes were proposed in this pull request? This patch proposes moving all Trigger implementations to `Triggers.scala`, to avoid exposing these implementations to the end users and let end users only deal with `Trigger.xxx` static methods. This fits the intention of deprecation of `ProcessingTIme`, and we agree to move others without deprecation as this patch will be shipped in major version (Spark 3.0.0). ## How was this patch tested? UTs modified to work with newly introduced class. Closes apache#24996 from HeartSaVioR/SPARK-28199. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…e API ## What changes were proposed in this pull request? SPARK-28199 (apache#24996) hid implementations of Triggers into `private[sql]` and encourage end users to use `Trigger.xxx` methods instead. As I got some post review comment on apache@7548a88#r34366934 we could remove annotations which are meant to be used with public API. ## How was this patch tested? N/A Closes apache#25200 from HeartSaVioR/SPARK-28199-FOLLOWUP. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
shall we have an item in the migration guide for it? |
|
Thanks for reminding. We didn't do as migration guide didn't exist for SS. I'll submit a PR quickly. |
|
There's no harm in a migration guide note, I think, other than potentially overloading it. This is a case I would have thought release notes cover. What would you write in the migration guide? "use the new class"? |
|
I guess same content with release note would be OK for migration guide - that's just a matter of preference of references. Suppose end users upgrade to Spark 3.0.0 and find their application fail to compile, which doc they would find for the first time? Migration guide looks to be the centralized one, so maybe preferred over release note. I have a commit but yet submitted a PR. Please let me know if it makes sense to add it in migration guide. |
| * the query. | ||
| */ | ||
| @Experimental | ||
| @Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we remove the annotations? it's private but the annotations say it's an API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. These classes are now not intended to expose so should remove annotations. Thanks for finding it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... in reality that was done in #25200. Let's make sure we check the latest code (not the code diff) while doing post-hoc review after long delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sure. Thanks :D.
| @Experimental | ||
| @Evolving | ||
| case object OneTimeTrigger extends Trigger | ||
| private[sql] case object OneTimeTrigger extends Trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's don't have private[sql] since execution package is already private per SPARK-16964
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will fix. The practice seems to be really easy to miss IMHO though.
…he SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…he SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 8305b77) Signed-off-by: Wenchen Fan <[email protected]>
… sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see #24996 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
… sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see #24996 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4afe2b1) Signed-off-by: Dongjoon Hyun <[email protected]>
… sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see apache#24996 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes apache#28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4afe2b1) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This patch proposes moving all Trigger implementations to
Triggers.scala, to avoid exposing these implementations to the end users and let end users only deal withTrigger.xxxstatic methods. This fits the intention of deprecation ofProcessingTIme, and we agree to move others without deprecation as this patch will be shipped in major version (Spark 3.0.0).How was this patch tested?
UTs modified to work with newly introduced class.