-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8861][SQL]Add accumulators to SparkPlan operations #7590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8861][SQL]Add accumulators to SparkPlan operations #7590
Conversation
Prototype implementations included in a LeafNode (LocalTableScan) and a UnaryNode (basicOperators.project). Will extend to other SparkPlan ops after initial review.
|
Test build #38050 has finished for PR 7590 at commit
|
|
Test build #38095 has finished for PR 7590 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to preserve binary compatibility, you need to add an overload which does not accept the internal flag.
[error] * method accumulator(java.lang.Object,java.lang.String,org.apache.spark.AccumulatorParam)org.apache.spark.Accumulator in class org.apache.spark.SparkContext does not have a correspondent in new version
[error] filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.SparkContext.accumulator")
In order to avoid ambiguity, Scala does not allow you to overload a method if any of the overloads define default arguments. As a result, I think that we might want to avoid making any user-facing changes to the SparkContext and create a private[spark] internalAccumulator() method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks Josh.
|
Test build #38099 has finished for PR 7590 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this constructor private[spark] so users can't set internal = true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
|
Test build #38120 has finished for PR 7590 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called for every row. Could you save accumulators("numTuples").asInstanceOf[Accumulator[Long]] as a variable and reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
…n-instrumentation * apache/master: (113 commits) [SPARK-8196][SQL] Fix null handling & documentation for next_day. [SPARK-9373][SQL] follow up for StructType support in Tungsten projection. [SPARK-9402][SQL] Remove CodegenFallback from Abs / FormatNumber. [SPARK-8919] [DOCUMENTATION, MLLIB] Added @SInCE tags to mllib.recommendation [EC2] Cosmetic fix for usage of spark-ec2 --ebs-vol-num option [SPARK-9394][SQL] Handle parentheses in CodeFormatter. Closes apache#6836 since Round has already been implemented. [SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite [MINOR] [SQL] Support mutable expression unit test with codegen projection [SPARK-9373][SQL] Support StructType in Tungsten projection [SPARK-8828] [SQL] Revert SPARK-5680 Fixed a test failure. [SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters. [SPARK-8195] [SPARK-8196] [SQL] udf next_day last_day [SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism [SPARK-9386] [SQL] Feature flag for metastore partition pruning [SPARK-9230] [ML] Support StringType features in RFormula [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint. [SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check ...
|
@andrewor14 Test was failing because |
|
Test build #38745 has finished for PR 7590 at commit
|
|
Test build #38732 has finished for PR 7590 at commit
|
|
Test build #38764 has finished for PR 7590 at commit
|
|
@feynmanliang I think the changes here are useful. We do want to thread the memory statistics to the SQL operators as well. However, my parallel changes may conflict with this patch. |
|
@andrewor14 Ok, I have some other things to do for QA so I will hold off on On Tue, Jul 28, 2015 at 3:58 PM andrewor14 [email protected] wrote:
|
|
@feynmanliang Thanks for your PR. However, I resolved SPARK-8861 in #7774 in a similar way. Could you close this one? |
Prototype implementations of metrics accumulator update included in a
LeafNode(LocalTableScan) and aUnaryNode(basicOperators#Project). Will extend to otherSparkPlanops after initial review.Notes for reviewers:
LeafNodes?