SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a... #550

techaddict · 2014-04-25T12:48:09Z

... second argument

Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a first argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions.
Deprecated that version and added one where the Partitioner is the second argument.

…s a second argument Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a first argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions. Deprecated that version and added one where the Partitioner is the second argument.

techaddict · 2014-04-25T12:51:27Z

We'll need to specify the parameter types for function passed to reduceByKey
reduceByKey((x: Long, y: Long) => x + y, 10) instead of reduceByKey(_ + _, 10)
For detailed discussion on compiler issue causing this,
https://groups.google.com/forum/#!topic/scala-user/Qhd3vJ2rAWM

@mateiz IMHO we should leave the method as it is, as this will make the code ugly.

AmplabJenkins · 2014-04-25T12:52:55Z

Can one of the admins verify this patch?

mateiz · 2014-04-26T02:20:39Z

Ah, wow, I never knew that. So if one takes a Partitioner first and one takes a function, the types are inferred, but if both take a function first, they're not?

In that case we might want to change our other methods too, like cogroup and groupByKey, to take a Partitioner first. Wouldn't this problem also affect them?

mateiz · 2014-04-26T02:23:17Z

CC @rxin, @pwendell

techaddict · 2014-04-26T02:52:42Z

@mateiz I think this only applies with anon function's, thus isn't affecting either cogroup or groupByKey.

rxin · 2014-04-26T06:33:30Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala

This line is over 100 chars wide

@rxin will fix this as soon as, a decision is made over whether we want to do this or not.

rxin · 2014-04-26T06:36:31Z

I never even realized we had a version of reduceByKey where the first argument is not the closure ...

rxin · 2014-04-26T07:16:43Z

I have one solution to this, although it is technically an API change, so just throwing it out there for discussion. We can remove all the numPartitions: Int arguments, and add an implicit conversion from int to HashPartitioner.

techaddict · 2014-04-26T07:37:59Z

@rxin +1

mateiz · 2014-04-26T22:46:37Z

I'd rather not add the implicit conversion from int to partitioner, it will be very hard to discover on its own. Instead maybe we can just leave this API as is. It's strange but there's a good reason for it.

SparkQA · 2014-08-06T02:24:26Z

QA tests have started for PR 550. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull

SparkQA · 2014-08-06T02:24:33Z

QA results for PR 550:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull

pwendell · 2014-09-21T04:58:44Z

It sounds like the conclusion here is to close this issue then.

This commit exists to close the following pull requests on Github: Closes apache#1328 (close requested by 'pwendell') Closes apache#2314 (close requested by 'pwendell') Closes apache#997 (close requested by 'pwendell') Closes apache#550 (close requested by 'pwendell') Closes apache#1506 (close requested by 'pwendell') Closes apache#2423 (close requested by 'mengxr') Closes apache#554 (close requested by 'joshrosen')

### What changes were proposed in this pull request? Due to a quirk in the parser, in some cases, IDENTIFIER(<funcStr>)(<arg>) is not properly recognized as a function invocation. The change is to remove the explicit IDENTIFIER-clause rule in the function invocation grammar and instead recognize IDENTIFIER(<arg>) within visitFunctionCall. ### Why are the changes needed? Function invocation support for IDENTIFIER is incomplete otherwise ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new testcases to identifier-clause.sql ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42888 from srielau/SPARK-45132. Lead-authored-by: srielau <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit f0b2e6d) Signed-off-by: Wenchen Fan <[email protected]> * fix --------- Signed-off-by: Wenchen Fan <[email protected]> Co-authored-by: srielau <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]>

rxin reviewed Apr 26, 2014
View reviewed changes

techaddict mentioned this pull request Apr 29, 2014

SPARK-1663. Corrections for several compile errors in streaming code examples, and updates to follow API changes #589

Closed

techaddict closed this Sep 21, 2014

SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a... #550

SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a... #550

Uh oh!

Conversation

techaddict commented Apr 25, 2014

Uh oh!

techaddict commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

mateiz commented Apr 26, 2014

Uh oh!

mateiz commented Apr 26, 2014

Uh oh!

techaddict commented Apr 26, 2014

Uh oh!

rxin Apr 26, 2014

Choose a reason for hiding this comment

Uh oh!

techaddict Apr 26, 2014

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 26, 2014

Uh oh!

rxin commented Apr 26, 2014

Uh oh!

techaddict commented Apr 26, 2014

Uh oh!

mateiz commented Apr 26, 2014

Uh oh!

SparkQA commented Aug 6, 2014

Uh oh!

SparkQA commented Aug 6, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants