-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7264][ML] Parallel lapply for sparkR #12426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
bd73c5b
work
thunterdb 0ca1094
documentation and fixes
thunterdb 0643df2
style issue
thunterdb 0299d8b
comments addressed
thunterdb 745a103
jsonify the other parameters
thunterdb a824d90
style
thunterdb cc86264
initial commit
thunterdb 1df83cb
not unlisting
thunterdb 651954f
Merge remote-tracking branch 'upstream/master' into 7264
thunterdb 2f7c60f
adding a simple test
thunterdb 1a2daaf
comments
thunterdb 2ad7b89
Merge remote-tracking branch 'upstream/master' into 7264
thunterdb a97f4df
Merge remote-tracking branch 'upstream/master' into 7264
thunterdb 6aa61d2
changes
thunterdb 2433f25
no fancy text
thunterdb 9ca6e15
Merge remote-tracking branch 'upstream/master' into 7264
thunterdb 378b437
comments
thunterdb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -141,3 +141,9 @@ test_that("sparkJars sparkPackages as comma-separated strings", { | |
| expect_that(processSparkJars(f), not(gives_warning())) | ||
| expect_match(processSparkJars(f), f) | ||
| }) | ||
|
|
||
| test_that("spark.lapply should perform simple transforms", { | ||
| sc <- sparkR.init() | ||
| doubled <- spark.lapply(sc, 1:10, function(x) { 2 * x }) | ||
| expect_equal(doubled, as.list(2 * 1:10)) | ||
| }) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to add a test where we capture some environment variables and/or use a package. Also we should update https://github.com/apache/spark/blob/master/docs/sparkr.md but we can open another JIRA for that I guess. |
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guess people could possibly get confused about when to call this vs when to call the newly proposed
dapply(#12493) Perhaps we need to explain this more and check forclass(list)in the event someone is passing in a SparkDataFrameto this function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dapply and spark.lapply have different schematics. No need to check class(list) here as a DataFrame can be treated as a list of columns. parallelize() will issue warning for DataFrame at here: https://github.com/apache/spark/blob/master/R/pkg/R/context.R#L110
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually fails here instead https://github.com/apache/spark/blob/master/R/pkg/R/context.R#L116
Spark
DataFrameis notis.data.frame