-
Notifications
You must be signed in to change notification settings - Fork 29k
Update WindowedDStream.scala #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
update the content of Exception when windowDuration is not multiple of parent.slideDuration
|
Can one of the admins verify this patch? |
|
Jenkins test this please. |
|
good catch |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
I merged this and put in into 1.0 and 0.9 |
update the content of Exception when windowDuration is not multiple of parent.slideDuration Author: baishuo(白硕) <[email protected]> Closes #390 from baishuo/windowdstream and squashes the following commits: 533c968 [baishuo(白硕)] Update WindowedDStream.scala (cherry picked from commit aa8bb11) Signed-off-by: Patrick Wendell <[email protected]>
update the content of Exception when windowDuration is not multiple of parent.slideDuration Author: baishuo(白硕) <[email protected]> Closes #390 from baishuo/windowdstream and squashes the following commits: 533c968 [baishuo(白硕)] Update WindowedDStream.scala
update the content of Exception when windowDuration is not multiple of parent.slideDuration Author: baishuo(白硕) <[email protected]> Closes #390 from baishuo/windowdstream and squashes the following commits: 533c968 [baishuo(白硕)] Update WindowedDStream.scala Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
|
thank you @pwendell |
|
@baishuo mind closing this? For some reason the auto-close didn't work. |
|
no problem @pwendell |
update the content of Exception when windowDuration is not multiple of parent.slideDuration Author: baishuo(白硕) <[email protected]> Closes apache#390 from baishuo/windowdstream and squashes the following commits: 533c968 [baishuo(白硕)] Update WindowedDStream.scala
…er-openstack#400 Do not run jobs for changes to documentation
…lated metrics resulting in potentially inaccurate data (apache#390) ### What changes were proposed in this pull request? This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data. ### Why are the changes needed? When the shuffle writer is SortShuffleWriter, it does not use SQLShuffleWriteMetricsReporter to update metrics, which causes AQE to obtain runtime statistics and the rowCount obtained is 0. Some optimization rules rely on rowCount statistics, such as `EliminateLimits`. Because rowCount is 0, it removes the limit operator. At this time, we get data results without limit. https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L168-L172 https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L2067-L2070 ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Production environment verification. **master metrics** <img width="296" alt="image" src="https://github.com/apache/spark/assets/3898450/dc9b6e8a-93ec-4f59-a903-71aa5b11962c"> **PR metrics** <img width="276" alt="image" src="https://github.com/apache/spark/assets/3898450/2d73b773-2dcc-4d23-81de-25dcadac86c1"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46459 from cxzl25/SPARK-48037-3.5. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: sychen <[email protected]>
update the content of Exception when windowDuration is not multiple of parent.slideDuration