-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17961][SparkR][SQL] Add storageLevel to DataFrame for SparkR #15516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #67078 has finished for PR 15516 at commit
|
4be3e5f to
75ff834
Compare
|
Test build #67080 has finished for PR 15516 at commit
|
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this storageLevel - to match the method name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung
Here I'am a little confusing, the method name is storageLevel does it need to change to something else ? or the doc where need to update but I forgot ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be
@rdname storageLevel instead of
@rdname storageLevel-methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the output of this doesn't say "MEMORY_AND_DISK"? Should we have that in addition to "StorageLevel(disk, memory, deserialized, 1 replicas)"? It might be confusing to set "MEMORY_AND_DISK" and get "StorageLevel(disk, memory, deserialized, 1 replicas)" back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion, I'll update the code later. thanks!
75ff834 to
c11dcce
Compare
|
@felixcheung code updated. thanks! |
|
Test build #67203 has finished for PR 15516 at commit
|
R/pkg/R/utils.R
Outdated
| useOffHeap <- callJMethod(levelObj, "useOffHeap") | ||
| deserialized <- callJMethod(levelObj, "deserialized") | ||
| replication <- callJMethod(levelObj, "replication") | ||
| if (!useDisk && !useMemory && !useOffHeap && !deserialized && replication == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardcoding the variations in R could be hard to maintain or easily get out of sync. is there anyway to do this?
Python seems to be able to get the enum name as a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python has itself StorageLevel class, and the python side code about storageLevel also exists duplicated code problem...
and if we make an r-side StorageLevel class may cause the code more complex and seems won't help solving the duplicated code problem.
What do you think about it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, about the R-side String constant, is there better way to avoid duplicated literal constant in code ? such as "MEMORY_AND_DISK", does we need to define some global vars, such as
MEMORY_AND_DISK_CONSTANT <- "MEMORY_AND_DISK" ?
and where could we put the definition above? if use this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Class in R wouldn't help much in this case.
You could have a look up table - check out https://github.com/apache/spark/blob/master/R/pkg/R/types.R and how it is used
|
Test build #67342 has started for PR 15516 at commit |
|
I think you've committed a jar file by accident |
|
@felixcheung |
bedc93f to
5af4a07
Compare
|
Test build #67370 has finished for PR 15516 at commit
|
|
I see. I understand the constraint here. I'd hold for a bit to see if anyone else has any thought on this? Also, I'd think it would be useful to output both the short name + long description (from toString) or similar. Perhaps later on we could deprecate the |
|
@felixcheung @yanboliang thanks! |
|
Test build #67487 has finished for PR 15516 at commit
|
|
did you get my comment previously here #15516 (comment) |
|
@felixcheung update rdname, |
|
Test build #67563 has finished for PR 15516 at commit
|
aa56467 to
1977591
Compare
|
Test build #67565 has finished for PR 15516 at commit
|
|
merged to master. |
## What changes were proposed in this pull request? Add storageLevel to DataFrame for SparkR. This is similar to this RP: apache#13780 but in R I do not make a class for `StorageLevel` but add a method `storageToString` ## How was this patch tested? test added. Author: WeichenXu <[email protected]> Closes apache#15516 from WeichenXu123/storageLevel_df_r.
## What changes were proposed in this pull request? Add storageLevel to DataFrame for SparkR. This is similar to this RP: apache#13780 but in R I do not make a class for `StorageLevel` but add a method `storageToString` ## How was this patch tested? test added. Author: WeichenXu <[email protected]> Closes apache#15516 from WeichenXu123/storageLevel_df_r.
What changes were proposed in this pull request?
Add storageLevel to DataFrame for SparkR.
This is similar to this RP: #13780
but in R I do not make a class for
StorageLevelbut add a method
storageToStringHow was this patch tested?
test added.