[SPARK-24117][SQL] Unified the getSizePerRow #21189

wangyum · 2018-04-28T11:26:42Z

What changes were proposed in this pull request?

This pr unified the getSizePerRow because getSizePerRow is used in many places. For example:

How was this patch tested?

Exist tests

SparkQA · 2018-04-28T15:15:33Z

Test build #89955 has finished for PR 21189 at commit cd41538.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-04-30T16:43:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala

 * Used to query the data that has been written into a [[MemorySinkV2]].
 */
 case class MemoryPlanV2(sink: MemorySinkV2, override val output: Seq[Attribute]) extends LeafNode {
-  private val sizePerRow = output.map(_.dataType.defaultSize).sum


@tdas @jose-torres Is that possible this can be zero? see https://github.com/wangyum/spark/blob/cd415381386f0ac5c29cd6dab57ceafc86e96adf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L34-L35

I wouldn't think it's possible.

gatorsmile · 2018-04-30T16:48:04Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkSuite.scala

    sink.addBatch(1, 4 to 6)
    plan.invalidateStatsCache()
-    assert(plan.stats.sizeInBytes === 24)
+    assert(plan.stats.sizeInBytes === 72)


MemorySinkV2 is mainly for testing. I think the stats changes will not impact anything, right? @tdas @jose-torres

It shouldn't impact anything, but abstractly it seems strange that this unification would cause the stats to change? What are we doing differently to cause this, and how confident are we this won't happen to production sinks?

It seems we forgot to count the row object overhead (8 bytes) before in memory stream.

gatorsmile · 2018-04-30T16:50:39Z

...pache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala

-    val childRowSize = p.child.output.map(_.dataType.defaultSize).sum + 8
-    val outputRowSize = p.output.map(_.dataType.defaultSize).sum + 8
+    val childRowSize = EstimationUtils.getSizePerRow(p.child.output)
+    val outputRowSize = EstimationUtils.getSizePerRow(p.output)


cc @juliuszsompolski @cloud-fan

cloud-fan · 2018-05-03T01:43:50Z

LGTM

SparkQA · 2018-05-08T13:16:01Z

Test build #90365 has finished for PR 21189 at commit f72084e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-05-08T15:43:23Z

thanks, merging to master!

Unified the getSizePerRow

cd41538

gatorsmile reviewed Apr 30, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into SPARK-24117

f72084e

asfgit closed this in 487faf1 May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-24117][SQL] Unified the getSizePerRow #21189

[SPARK-24117][SQL] Unified the getSizePerRow #21189

Uh oh!

wangyum commented Apr 28, 2018

Uh oh!

SparkQA commented Apr 28, 2018

Uh oh!

gatorsmile Apr 30, 2018 •

edited

Loading

Uh oh!

jose-torres May 1, 2018

Uh oh!

gatorsmile Apr 30, 2018

Uh oh!

jose-torres May 1, 2018

Uh oh!

cloud-fan May 2, 2018

Uh oh!

jose-torres May 2, 2018

Uh oh!

gatorsmile Apr 30, 2018

Uh oh!

cloud-fan commented May 3, 2018

Uh oh!

SparkQA commented May 8, 2018

Uh oh!

cloud-fan commented May 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-24117][SQL] Unified the getSizePerRow #21189

[SPARK-24117][SQL] Unified the getSizePerRow #21189

Uh oh!

Conversation

wangyum commented Apr 28, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 28, 2018

Uh oh!

gatorsmile Apr 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jose-torres May 1, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

jose-torres May 1, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 2, 2018

Choose a reason for hiding this comment

Uh oh!

jose-torres May 2, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 3, 2018

Uh oh!

SparkQA commented May 8, 2018

Uh oh!

cloud-fan commented May 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gatorsmile Apr 30, 2018 •

edited

Loading