Skip to content

Conversation

@baishuo
Copy link
Contributor

@baishuo baishuo commented Apr 11, 2014

update the content of Exception when windowDuration is not multiple of parent.slideDuration

update the content of Exception when windowDuration is not multiple of parent.slideDuration
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Jenkins test this please.

@pwendell
Copy link
Contributor

good catch

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14054/

@pwendell
Copy link
Contributor

I merged this and put in into 1.0 and 0.9

baishuo added a commit that referenced this pull request Apr 12, 2014
update the content of Exception when windowDuration is not multiple of parent.slideDuration

Author: baishuo(白硕) <[email protected]>

Closes #390 from baishuo/windowdstream and squashes the following commits:

533c968 [baishuo(白硕)] Update WindowedDStream.scala
(cherry picked from commit aa8bb11)

Signed-off-by: Patrick Wendell <[email protected]>
baishuo added a commit that referenced this pull request Apr 12, 2014
update the content of Exception when windowDuration is not multiple of parent.slideDuration

Author: baishuo(白硕) <[email protected]>

Closes #390 from baishuo/windowdstream and squashes the following commits:

533c968 [baishuo(白硕)] Update WindowedDStream.scala
baishuo added a commit that referenced this pull request Apr 12, 2014
update the content of Exception when windowDuration is not multiple of parent.slideDuration

Author: baishuo(白硕) <[email protected]>

Closes #390 from baishuo/windowdstream and squashes the following commits:

533c968 [baishuo(白硕)] Update WindowedDStream.scala

Conflicts:

	streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
@baishuo
Copy link
Contributor Author

baishuo commented Apr 12, 2014

thank you @pwendell

@pwendell
Copy link
Contributor

@baishuo mind closing this? For some reason the auto-close didn't work.

@baishuo
Copy link
Contributor Author

baishuo commented Apr 13, 2014

no problem @pwendell

@baishuo baishuo closed this Apr 13, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
update the content of Exception when windowDuration is not multiple of parent.slideDuration

Author: baishuo(白硕) <[email protected]>

Closes apache#390 from baishuo/windowdstream and squashes the following commits:

533c968 [baishuo(白硕)] Update WindowedDStream.scala
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…lated metrics resulting in potentially inaccurate data (apache#390)

### What changes were proposed in this pull request?
This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data.

### Why are the changes needed?
When the shuffle writer is SortShuffleWriter, it does not use SQLShuffleWriteMetricsReporter to update metrics, which causes AQE to obtain runtime statistics and the rowCount obtained is 0.

Some optimization rules rely on rowCount statistics, such as `EliminateLimits`. Because rowCount is 0, it removes the limit operator. At this time, we get data results without limit.

https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L168-L172

https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L2067-L2070

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Production environment verification.

**master metrics**
<img width="296" alt="image" src="https://github.com/apache/spark/assets/3898450/dc9b6e8a-93ec-4f59-a903-71aa5b11962c">

**PR metrics**

<img width="276" alt="image" src="https://github.com/apache/spark/assets/3898450/2d73b773-2dcc-4d23-81de-25dcadac86c1">

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46459 from cxzl25/SPARK-48037-3.5.

Authored-by: sychen <[email protected]>

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: sychen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants