[SPARK-1701] Clarify slice vs partition in the programming guide#2305
[SPARK-1701] Clarify slice vs partition in the programming guide#2305mattf wants to merge 3 commits intoapache:masterfrom
Conversation
This is a partial solution to SPARK-1701, only addressing the documentation confusion. Additional work can be to actually change the numSlices parameter name across languages, with care required for scala & python to maintain backward compatibility for named parameters.
|
QA tests have started for PR 2305 at commit
|
|
QA tests have finished for PR 2305 at commit
|
|
@JoshRosen will you take a look at this? |
|
Sorry for not reviewing this until now; it sort of fell off my radar. |
There was a problem hiding this comment.
Maybe the "Note:" should mention that in some places we still say numSlices (for backwards compatibility with earlier versions of Spark) and that "slices" should be considered as a synonym for "partitions"; there are a lot of places that use numPartitions, etc, so we may want to emphasize that this discrepancy only occurs in a few places.
|
thanks for the feedback. i've changed the language to be more inline with your suggestion. |
|
QA tests have started for PR 2305 at commit
|
|
QA tests have finished for PR 2305 at commit
|
i'm getting HTTP 503 from jenkins, but i'm gonna go out on a limb and say this doc change didn't break the unit tests. |
|
I think that Jenkins might have crashed or restarted overnight, but it seems to be working now. This looks good to me, so I'm going to merge it. Feel free to open similar PRs for other documentation improvements / clarifications, since these types of edits are really helpful. |
…nd code?) ## What changes were proposed in this pull request? Came across the term "slice" when running some spark scala code. Consequently, a Google search indicated that "slices" and "partitions" refer to the same things; indeed see: - [This issue](https://issues.apache.org/jira/browse/SPARK-1701) - [This pull request](apache#2305) - [This StackOverflow answer](http://stackoverflow.com/questions/23436640/what-is-the-difference-between-an-rdd-partition-and-a-slice) and [this one](http://stackoverflow.com/questions/24269495/what-are-the-differences-between-slices-and-partitions-of-rdds) Thus this pull request fixes the occurrence of slice I came accross. Nonetheless, [it would appear](https://github.com/apache/spark/search?utf8=%E2%9C%93&q=slice&type=) there are still many references to "slice/slices" - thus I thought I'd raise this Pull Request to address the issue (sorry if this is the wrong place, I'm not too familar with raising apache issues). ## How was this patch tested? (Not tested locally - only a minor exception message change.) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: asmith26 <asmith26@users.noreply.github.com> Closes apache#17565 from asmith26/master.
This is a partial solution to SPARK-1701, only addressing the
documentation confusion.
Additional work can be to actually change the numSlices parameter name
across languages, with care required for scala & python to maintain
backward compatibility for named parameters.