Skip to content

Conversation

@techaddict
Copy link
Contributor

... second argument

Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a first argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions.
Deprecated that version and added one where the Partitioner is the second argument.

…s a second argument

Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a first argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions.
Deprecated that version and added one where the Partitioner is the second argument.
@techaddict
Copy link
Contributor Author

We'll need to specify the parameter types for function passed to reduceByKey
reduceByKey((x: Long, y: Long) => x + y, 10) instead of reduceByKey(_ + _, 10)
For detailed discussion on compiler issue causing this,
https://groups.google.com/forum/#!topic/scala-user/Qhd3vJ2rAWM

@mateiz IMHO we should leave the method as it is, as this will make the code ugly.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mateiz
Copy link
Contributor

mateiz commented Apr 26, 2014

Ah, wow, I never knew that. So if one takes a Partitioner first and one takes a function, the types are inferred, but if both take a function first, they're not?

In that case we might want to change our other methods too, like cogroup and groupByKey, to take a Partitioner first. Wouldn't this problem also affect them?

@mateiz
Copy link
Contributor

mateiz commented Apr 26, 2014

CC @rxin, @pwendell

@techaddict
Copy link
Contributor Author

@mateiz I think this only applies with anon function's, thus isn't affecting either cogroup or groupByKey.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is over 100 chars wide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin will fix this as soon as, a decision is made over whether we want to do this or not.

@rxin
Copy link
Contributor

rxin commented Apr 26, 2014

I never even realized we had a version of reduceByKey where the first argument is not the closure ...

@rxin
Copy link
Contributor

rxin commented Apr 26, 2014

I have one solution to this, although it is technically an API change, so just throwing it out there for discussion. We can remove all the numPartitions: Int arguments, and add an implicit conversion from int to HashPartitioner.

@techaddict
Copy link
Contributor Author

@rxin +1

@mateiz
Copy link
Contributor

mateiz commented Apr 26, 2014

I'd rather not add the implicit conversion from int to partitioner, it will be very hard to discover on its own. Instead maybe we can just leave this API as is. It's strange but there's a good reason for it.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 550. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 550:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull

@pwendell
Copy link
Contributor

It sounds like the conclusion here is to close this issue then.

@techaddict techaddict closed this Sep 21, 2014
guavuslabs-builder pushed a commit to ThalesGroup/spark that referenced this pull request Sep 21, 2014
This commit exists to close the following pull requests on Github:

Closes apache#1328 (close requested by 'pwendell')
Closes apache#2314 (close requested by 'pwendell')
Closes apache#997 (close requested by 'pwendell')
Closes apache#550 (close requested by 'pwendell')
Closes apache#1506 (close requested by 'pwendell')
Closes apache#2423 (close requested by 'mengxr')
Closes apache#554 (close requested by 'joshrosen')
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
### What changes were proposed in this pull request?

Due to a quirk in the parser, in some cases, IDENTIFIER(<funcStr>)(<arg>) is not properly recognized as a function invocation.

The change is to remove the explicit IDENTIFIER-clause rule in the function invocation grammar and instead recognize
IDENTIFIER(<arg>) within visitFunctionCall.

### Why are the changes needed?

Function invocation support for IDENTIFIER is incomplete otherwise

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new testcases to identifier-clause.sql

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#42888 from srielau/SPARK-45132.

Lead-authored-by: srielau <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit f0b2e6d)
Signed-off-by: Wenchen Fan <[email protected]>

* fix

---------

Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: srielau <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants