Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Apr 28, 2018

What changes were proposed in this pull request?

This pr unified the getSizePerRow because getSizePerRow is used in many places. For example:

  1. LocalRelation.scala#L80
  2. SizeInBytesOnlyStatsPlanVisitor.scala#L36

How was this patch tested?

Exist tests

@SparkQA
Copy link

SparkQA commented Apr 28, 2018

Test build #89955 has finished for PR 21189 at commit cd41538.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Used to query the data that has been written into a [[MemorySinkV2]].
*/
case class MemoryPlanV2(sink: MemorySinkV2, override val output: Seq[Attribute]) extends LeafNode {
private val sizePerRow = output.map(_.dataType.defaultSize).sum
Copy link
Member

@gatorsmile gatorsmile Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think it's possible.

sink.addBatch(1, 4 to 6)
plan.invalidateStatsCache()
assert(plan.stats.sizeInBytes === 24)
assert(plan.stats.sizeInBytes === 72)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemorySinkV2 is mainly for testing. I think the stats changes will not impact anything, right? @tdas @jose-torres

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't impact anything, but abstractly it seems strange that this unification would cause the stats to change? What are we doing differently to cause this, and how confident are we this won't happen to production sinks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we forgot to count the row object overhead (8 bytes) before in memory stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM then

val childRowSize = p.child.output.map(_.dataType.defaultSize).sum + 8
val outputRowSize = p.output.map(_.dataType.defaultSize).sum + 8
val childRowSize = EstimationUtils.getSizePerRow(p.child.output)
val outputRowSize = EstimationUtils.getSizePerRow(p.output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 8, 2018

Test build #90365 has finished for PR 21189 at commit f72084e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 487faf1 May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants