[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling #18470

hvanhovell · 2017-06-29T22:08:29Z

What changes were proposed in this pull request?

WindowExec currently improperly stores complex objects (UnsafeRow, UnsafeArrayData, UnsafeMapData, UTF8String) during aggregation by keeping a reference in the buffer used by GeneratedMutableProjections to the actual input data. Things go wrong when the input object (or the backing bytes) are reused for other things. This could happen in window functions when it starts spilling to disk. When reading the back the spill files the UnsafeSorterSpillReader reuses the buffer to which the UnsafeRow points, leading to weird corruption scenario's. Note that this only happens for aggregate functions that preserve (parts of) their input, for example FIRST, LAST, MIN & MAX.

This was not seen before, because the spilling logic was not doing actual spills as much and actually used an in-memory page. This page was not cleaned up during window processing and made sure unsafe objects point to their own dedicated memory location. This was changed by #16909, after this PR Spark spills more eagerly.

This PR provides a surgical fix because we are close to releasing Spark 2.2. This change just makes sure that there cannot be any object reuse at the expensive of a little bit of performance. We will follow-up with a more subtle solution at a later point.

How was this patch tested?

Added a regression test to DataFrameWindowFunctionsSuite.

…ith spilling.

gatorsmile · 2017-06-29T22:20:39Z

LGTM @liancheng @cloud-fan

liancheng · 2017-06-29T22:20:52Z

LGTM pending Jenkins.

SparkQA · 2017-06-29T23:50:27Z

Test build #78932 has finished for PR 18470 at commit a41335f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-29T23:52:45Z

Test build #78933 has finished for PR 18470 at commit 7e30ae5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-30T00:05:25Z

retes this please

cloud-fan · 2017-06-30T00:13:12Z

retest this please

cloud-fan · 2017-06-30T00:13:16Z

LGTM

SparkQA · 2017-06-30T02:27:59Z

Test build #78941 has finished for PR 18470 at commit 7e30ae5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…lling ## What changes were proposed in this pull request? `WindowExec` currently improperly stores complex objects (UnsafeRow, UnsafeArrayData, UnsafeMapData, UTF8String) during aggregation by keeping a reference in the buffer used by `GeneratedMutableProjections` to the actual input data. Things go wrong when the input object (or the backing bytes) are reused for other things. This could happen in window functions when it starts spilling to disk. When reading the back the spill files the `UnsafeSorterSpillReader` reuses the buffer to which the `UnsafeRow` points, leading to weird corruption scenario's. Note that this only happens for aggregate functions that preserve (parts of) their input, for example `FIRST`, `LAST`, `MIN` & `MAX`. This was not seen before, because the spilling logic was not doing actual spills as much and actually used an in-memory page. This page was not cleaned up during window processing and made sure unsafe objects point to their own dedicated memory location. This was changed by #16909, after this PR Spark spills more eagerly. This PR provides a surgical fix because we are close to releasing Spark 2.2. This change just makes sure that there cannot be any object reuse at the expensive of a little bit of performance. We will follow-up with a more subtle solution at a later point. ## How was this patch tested? Added a regression test to `DataFrameWindowFunctionsSuite`. Author: Herman van Hovell <[email protected]> Closes #18470 from hvanhovell/SPARK-21258. (cherry picked from commit e2f32ee) Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2017-06-30T04:35:21Z

thanks, merging to master/2.2/2.1!

…lling ## What changes were proposed in this pull request? `WindowExec` currently improperly stores complex objects (UnsafeRow, UnsafeArrayData, UnsafeMapData, UTF8String) during aggregation by keeping a reference in the buffer used by `GeneratedMutableProjections` to the actual input data. Things go wrong when the input object (or the backing bytes) are reused for other things. This could happen in window functions when it starts spilling to disk. When reading the back the spill files the `UnsafeSorterSpillReader` reuses the buffer to which the `UnsafeRow` points, leading to weird corruption scenario's. Note that this only happens for aggregate functions that preserve (parts of) their input, for example `FIRST`, `LAST`, `MIN` & `MAX`. This was not seen before, because the spilling logic was not doing actual spills as much and actually used an in-memory page. This page was not cleaned up during window processing and made sure unsafe objects point to their own dedicated memory location. This was changed by #16909, after this PR Spark spills more eagerly. This PR provides a surgical fix because we are close to releasing Spark 2.2. This change just makes sure that there cannot be any object reuse at the expensive of a little bit of performance. We will follow-up with a more subtle solution at a later point. ## How was this patch tested? Added a regression test to `DataFrameWindowFunctionsSuite`. Author: Herman van Hovell <[email protected]> Closes #18470 from hvanhovell/SPARK-21258. (cherry picked from commit e2f32ee) Signed-off-by: Wenchen Fan <[email protected]>

zzcclp · 2017-06-30T06:43:05Z

Hi, @cloud-fan , @hvanhovell , after merging this pr into branch-2.1, there are some errors:
1.
value WINDOW_EXEC_BUFFER_SPILL_THRESHOLD is not a member of object org.apache.spark.sql.internal.SQLConf
2.
overloaded method value json with alternatives: (jsonRDD: org.apache.spark.rdd.RDD[String])org.apache.spark.sql.DataFrame <and> (jsonRDD: org.apache.spark.api.java.JavaRDD[String])org.apache.spark.sql.DataFrame <and> (paths: String*)org.apache.spark.sql.DataFrame <and> (path: String)org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.sql.Dataset[String])

cloud-fan · 2017-06-30T06:45:15Z

ah the spilling logic is not in 2.1, let me revert it, sorry for the trouble.

hvanhovell added 2 commits June 29, 2017 23:56

Fix WindowExec complex object preserving aggregation in combination w…

a41335f

…ith spilling.

Add comment

7e30ae5

asfgit closed this in e2f32ee Jun 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling #18470

[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling #18470

Uh oh!

hvanhovell commented Jun 29, 2017 •

edited

Loading

Uh oh!

gatorsmile commented Jun 29, 2017

Uh oh!

liancheng commented Jun 29, 2017

Uh oh!

SparkQA commented Jun 29, 2017

Uh oh!

SparkQA commented Jun 29, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

SparkQA commented Jun 30, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

zzcclp commented Jun 30, 2017 •

edited

Loading

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling #18470

[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling #18470

Uh oh!

Conversation

hvanhovell commented Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Jun 29, 2017

Uh oh!

liancheng commented Jun 29, 2017

Uh oh!

SparkQA commented Jun 29, 2017

Uh oh!

SparkQA commented Jun 29, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

SparkQA commented Jun 30, 2017

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

zzcclp commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jun 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hvanhovell commented Jun 29, 2017 •

edited

Loading

zzcclp commented Jun 30, 2017 •

edited

Loading