Skip to content

Conversation

@feynmanliang
Copy link
Contributor

Prototype implementations of metrics accumulator update included in a LeafNode (LocalTableScan) and a UnaryNode (basicOperators#Project). Will extend to other SparkPlan ops after initial review.

Notes for reviewers:

  • Accumulator updates are currently achieved by prematurely calling actions, which is undesirable. Perhaps it may be better to instrument everything except for LeafNodes?

Prototype implementations included in a LeafNode (LocalTableScan)
and a UnaryNode (basicOperators.project). Will extend to other
SparkPlan ops after initial review.
@feynmanliang feynmanliang changed the title [SPARK-8856][SQL]Add accumulators to SparkPlan and initial impls [SPARK-8861][SQL]Add accumulators to SparkPlan and initial impls Jul 22, 2015
@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38050 has finished for PR 7590 at commit 1a4c9de.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Accumulator[T](

@feynmanliang feynmanliang changed the title [SPARK-8861][SQL]Add accumulators to SparkPlan and initial impls [SPARK-8861][SQL]Add accumulators to SparkPlan operations Jul 22, 2015
@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38095 has finished for PR 7590 at commit 9629ce0.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Accumulator[T](

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to preserve binary compatibility, you need to add an overload which does not accept the internal flag.

[error]  * method accumulator(java.lang.Object,java.lang.String,org.apache.spark.AccumulatorParam)org.apache.spark.Accumulator in class org.apache.spark.SparkContext does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.SparkContext.accumulator")

In order to avoid ambiguity, Scala does not allow you to overload a method if any of the overloads define default arguments. As a result, I think that we might want to avoid making any user-facing changes to the SparkContext and create a private[spark] internalAccumulator() method instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks Josh.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38099 has finished for PR 7590 at commit 1849fe3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this constructor private[spark] so users can't set internal = true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38120 has finished for PR 7590 at commit 1d5fb86.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called for every row. Could you save accumulators("numTuples").asInstanceOf[Accumulator[Long]] as a variable and reuse it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

…n-instrumentation

* apache/master: (113 commits)
  [SPARK-8196][SQL] Fix null handling & documentation for next_day.
  [SPARK-9373][SQL] follow up for StructType support in Tungsten projection.
  [SPARK-9402][SQL] Remove CodegenFallback from Abs / FormatNumber.
  [SPARK-8919] [DOCUMENTATION, MLLIB] Added @SInCE tags to mllib.recommendation
  [EC2] Cosmetic fix for usage of spark-ec2 --ebs-vol-num option
  [SPARK-9394][SQL] Handle parentheses in CodeFormatter.
  Closes apache#6836 since Round has already been implemented.
  [SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite
  [MINOR] [SQL] Support mutable expression unit test with codegen projection
  [SPARK-9373][SQL] Support StructType in Tungsten projection
  [SPARK-8828] [SQL] Revert SPARK-5680
  Fixed a test failure.
  [SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters.
  [SPARK-8195] [SPARK-8196] [SQL] udf next_day last_day
  [SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism
  [SPARK-9386] [SQL] Feature flag for metastore partition pruning
  [SPARK-9230] [ML] Support StringType features in RFormula
  [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint.
  [SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests
  [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check
  ...
@feynmanliang
Copy link
Contributor Author

@andrewor14 Test was failing because cacheTable was creating SparkPlans and I'm not sure that my fix is appropriate. I know you're working on something larger involving this instrumentation so do you mind checking to make sure this PR is useful? If not, I can close.

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38745 has finished for PR 7590 at commit d5b070e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38732 has finished for PR 7590 at commit 8b9a36c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Abs(child: Expression) extends UnaryExpression with ExpectsInputTypes

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38764 has finished for PR 7590 at commit a5455b1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@feynmanliang I think the changes here are useful. We do want to thread the memory statistics to the SQL operators as well. However, my parallel changes may conflict with this patch.

@feynmanliang
Copy link
Contributor Author

@andrewor14 Ok, I have some other things to do for QA so I will hold off on
this until you merge.

On Tue, Jul 28, 2015 at 3:58 PM andrewor14 [email protected] wrote:

@feynmanliang https://github.com/feynmanliang I think the changes here
are useful. We do want to thread the memory statistics to the SQL operators
as well. However, my parallel changes may conflict with this patch.


Reply to this email directly or view it on GitHub
#7590 (comment).

@zsxwing
Copy link
Member

zsxwing commented Jul 30, 2015

@feynmanliang Thanks for your PR. However, I resolved SPARK-8861 in #7774 in a similar way. Could you close this one?

@feynmanliang feynmanliang deleted the SPARK-8856-SparkPlan-instrumentation branch August 4, 2015 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants