Skip to content

Conversation

@ala
Copy link
Contributor

@ala ala commented Feb 16, 2017

What changes were proposed in this pull request?

The Range was modified to produce "recordsRead" metric instead of "generated rows". The tests were updated and partially moved to SQLMetricsSuite.

How was this patch tested?

Unit tests.

}
}

def run(df: DataFrame): List[(Long, Long, Long)] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document what hte long long long are for?

stageIdToMetricsResult = HashMap.empty[Int, MetricsResult]
}

def getResults(): List[(Long, Long, Long)] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too long long long

@rxin
Copy link
Contributor

rxin commented Feb 16, 2017

cc @hvanhovell if you have a min to review this ...

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #73004 has finished for PR 16960 at commit 10a53a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #73005 has finished for PR 16960 at commit 088556b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


object InputOutputMetricsHelper {
private class InputOutputMetricsListener extends SparkListener {
private case class MetricsResult(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add space

var shuffleRecordsRead: Long = 0L,
var sumMaxOutputRows: Long = 0L)

private[this] var stageIdToMetricsResult = HashMap.empty[Int, MetricsResult]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this val.

def run(df: DataFrame): List[(Long, Long, Long)] = {
val spark = df.sparkSession
val sparkContext = spark.sparkContext
val listener = new InputOutputMetricsListener()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use try...finally here

}

override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = synchronized {
val res = stageIdToMetricsResult.getOrElseUpdate(taskEnd.stageId, { MetricsResult() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit remove curly braces

spark.read.parquet(dir).createOrReplaceTempView("pqS")

val res3 = InputOutputMetricsHelper.run(
spark.range(0, 30).repartition(3).crossJoin(sql("select * from pqS")).repartition(2).toDF()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to reason about. Could you add a few lines of documentation?

@hvanhovell
Copy link
Contributor

LGTM - pending jenkins.

@SparkQA
Copy link

SparkQA commented Feb 17, 2017

Test build #73057 has finished for PR 16960 at commit 70fe843.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Feb 18, 2017

Merging in master.

@asfgit asfgit closed this in b486ffc Feb 18, 2017
liancheng pushed a commit to liancheng/spark that referenced this pull request Mar 17, 2017
The Range was modified to produce "recordsRead" metric instead of "generated rows". The tests were updated and partially moved to SQLMetricsSuite.

Unit tests.

Author: Ala Luszczak <[email protected]>

Closes apache#16960 from ala/range-records-read.

(cherry picked from commit b486ffc)
Signed-off-by: Reynold Xin <[email protected]>
@jaceklaskowski
Copy link
Contributor

I think that the commit has left numGeneratedRows metrics off, hasn't it? (it was added in #16829)

@ala
Copy link
Contributor Author

ala commented May 10, 2017

True. There's a couple of lines that should be removed with this change, that were left behind. numGeneratedRows should be gone.

@jaceklaskowski
Copy link
Contributor

I'll have a look at this this week and send a PR unless you beat me to it :) Thanks @ala!

@ala
Copy link
Contributor Author

ala commented May 11, 2017

Thanks @jaceklaskowski - it's already done: #17939

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants