[SPARK-8861][SQL]Add accumulators to SparkPlan operations #7590

feynmanliang · 2015-07-22T07:04:41Z

Prototype implementations of metrics accumulator update included in a LeafNode (LocalTableScan) and a UnaryNode (basicOperators#Project). Will extend to other SparkPlan ops after initial review.

Notes for reviewers:

Accumulator updates are currently achieved by prematurely calling actions, which is undesirable. Perhaps it may be better to instrument everything except for LeafNodes?

Prototype implementations included in a LeafNode (LocalTableScan) and a UnaryNode (basicOperators.project). Will extend to other SparkPlan ops after initial review.

SparkQA · 2015-07-22T07:09:04Z

Test build #38050 has finished for PR 7590 at commit 1a4c9de.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Accumulator[T](

SparkQA · 2015-07-22T17:30:15Z

Test build #38095 has finished for PR 7590 at commit 9629ce0.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Accumulator[T](

JoshRosen · 2015-07-22T17:39:42Z

core/src/main/scala/org/apache/spark/SparkContext.scala

In order to preserve binary compatibility, you need to add an overload which does not accept the internal flag.

[error] * method accumulator(java.lang.Object,java.lang.String,org.apache.spark.AccumulatorParam)org.apache.spark.Accumulator in class org.apache.spark.SparkContext does not have a correspondent in new version [error] filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.SparkContext.accumulator")

In order to avoid ambiguity, Scala does not allow you to overload a method if any of the overloads define default arguments. As a result, I think that we might want to avoid making any user-facing changes to the SparkContext and create a private[spark] internalAccumulator() method instead.

Makes sense, thanks Josh.

SparkQA · 2015-07-22T19:55:35Z

Test build #38099 has finished for PR 7590 at commit 1849fe3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-07-22T20:16:27Z

core/src/main/scala/org/apache/spark/Accumulators.scala

should we make this constructor private[spark] so users can't set internal = true?

SparkQA · 2015-07-23T00:38:25Z

Test build #38120 has finished for PR 7590 at commit 1d5fb86.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-07-28T02:29:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

This is called for every row. Could you save accumulators("numTuples").asInstanceOf[Accumulator[Long]] as a variable and reuse it?

@SInCE

…n-instrumentation * apache/master: (113 commits) [SPARK-8196][SQL] Fix null handling & documentation for next_day. [SPARK-9373][SQL] follow up for StructType support in Tungsten projection. [SPARK-9402][SQL] Remove CodegenFallback from Abs / FormatNumber. [SPARK-8919] [DOCUMENTATION, MLLIB] Added @SInCE tags to mllib.recommendation [EC2] Cosmetic fix for usage of spark-ec2 --ebs-vol-num option [SPARK-9394][SQL] Handle parentheses in CodeFormatter. Closes apache#6836 since Round has already been implemented. [SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite [MINOR] [SQL] Support mutable expression unit test with codegen projection [SPARK-9373][SQL] Support StructType in Tungsten projection [SPARK-8828] [SQL] Revert SPARK-5680 Fixed a test failure. [SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters. [SPARK-8195] [SPARK-8196] [SQL] udf next_day last_day [SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism [SPARK-9386] [SQL] Feature flag for metastore partition pruning [SPARK-9230] [ML] Support StringType features in RFormula [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint. [SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check ...

feynmanliang · 2015-07-28T18:31:34Z

@andrewor14 Test was failing because cacheTable was creating SparkPlans and I'm not sure that my fix is appropriate. I know you're working on something larger involving this instrumentation so do you mind checking to make sure this PR is useful? If not, I can close.

SparkQA · 2015-07-28T18:59:59Z

Test build #38745 has finished for PR 7590 at commit d5b070e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-28T19:39:41Z

Test build #38732 has finished for PR 7590 at commit 8b9a36c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Abs(child: Expression) extends UnaryExpression with ExpectsInputTypes

SparkQA · 2015-07-28T22:48:10Z

Test build #38764 has finished for PR 7590 at commit a5455b1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-07-28T22:57:34Z

@feynmanliang I think the changes here are useful. We do want to thread the memory statistics to the SQL operators as well. However, my parallel changes may conflict with this patch.

feynmanliang · 2015-07-29T00:52:25Z

@andrewor14 Ok, I have some other things to do for QA so I will hold off on
this until you merge.

On Tue, Jul 28, 2015 at 3:58 PM andrewor14 [email protected] wrote:

@feynmanliang https://github.com/feynmanliang I think the changes here
are useful. We do want to thread the memory statistics to the SQL operators
as well. However, my parallel changes may conflict with this patch.

—
Reply to this email directly or view it on GitHub
#7590 (comment).

zsxwing · 2015-07-30T04:31:26Z

@feynmanliang Thanks for your PR. However, I resolved SPARK-8861 in #7774 in a similar way. Could you close this one?

Add accumulators to SparkPlan and initial impls

1a4c9de

Prototype implementations included in a LeafNode (LocalTableScan) and a UnaryNode (basicOperators.project). Will extend to other SparkPlan ops after initial review.

feynmanliang changed the title ~~[SPARK-8856][SQL]Add accumulators to SparkPlan and initial impls~~ [SPARK-8861][SQL]Add accumulators to SparkPlan and initial impls Jul 22, 2015

feynmanliang changed the title ~~[SPARK-8861][SQL]Add accumulators to SparkPlan and initial impls~~ [SPARK-8861][SQL]Add accumulators to SparkPlan operations Jul 22, 2015

Feynman Liang added 2 commits July 22, 2015 10:11

Make accumulators map only instantiate once

0c89d22

Make accumulators doc more accurate

9629ce0

JoshRosen reviewed Jul 22, 2015
View reviewed changes

Move internal accumulator into a private SparkContext method

1849fe3

andrewor14 reviewed Jul 22, 2015
View reviewed changes

Transform instead of drain iterator

1d5fb86

zsxwing reviewed Jul 28, 2015
View reviewed changes

Fix failing tests

d5b070e

Remove println

a5455b1

feynmanliang closed this Jul 30, 2015

feynmanliang deleted the SPARK-8856-SparkPlan-instrumentation branch August 4, 2015 22:49

[SPARK-8861][SQL]Add accumulators to SparkPlan operations #7590

[SPARK-8861][SQL]Add accumulators to SparkPlan operations #7590

Uh oh!

Conversation

feynmanliang commented Jul 22, 2015

Uh oh!

SparkQA commented Jul 22, 2015

Uh oh!

SparkQA commented Jul 22, 2015

Uh oh!

JoshRosen Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 22, 2015

Uh oh!

andrewor14 Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 23, 2015

Uh oh!

zsxwing Jul 28, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang Jul 28, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang commented Jul 28, 2015

Uh oh!

SparkQA commented Jul 28, 2015

Uh oh!

SparkQA commented Jul 28, 2015

Uh oh!

SparkQA commented Jul 28, 2015

Uh oh!

andrewor14 commented Jul 28, 2015

Uh oh!

feynmanliang commented Jul 29, 2015

Uh oh!

zsxwing commented Jul 30, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants