[SPARK-17961][SparkR][SQL] Add storageLevel to DataFrame for SparkR #15516

WeichenXu123 · 2016-10-17T17:10:46Z

What changes were proposed in this pull request?

Add storageLevel to DataFrame for SparkR.
This is similar to this RP: #13780

but in R I do not make a class for StorageLevel
but add a method storageToString

How was this patch tested?

test added.

SparkQA · 2016-10-17T17:14:26Z

Test build #67078 has finished for PR 15516 at commit 4be3e5f.

This patch fails some tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-17T18:28:23Z

Test build #67080 has finished for PR 15516 at commit 75ff834.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-10-18T18:31:49Z

R/pkg/R/DataFrame.R

change this storageLevel - to match the method name

@felixcheung
Here I'am a little confusing, the method name is storageLevel does it need to change to something else ? or the doc where need to update but I forgot ?

this should be
@rdname storageLevel instead of
@rdname storageLevel-methods

felixcheung · 2016-10-18T18:33:19Z

R/pkg/inst/tests/testthat/test_sparkSQL.R

so the output of this doesn't say "MEMORY_AND_DISK"? Should we have that in addition to "StorageLevel(disk, memory, deserialized, 1 replicas)"? It might be confusing to set "MEMORY_AND_DISK" and get "StorageLevel(disk, memory, deserialized, 1 replicas)" back?

good suggestion, I'll update the code later. thanks!

WeichenXu123 · 2016-10-19T16:01:50Z

@felixcheung code updated. thanks!

SparkQA · 2016-10-19T16:35:28Z

Test build #67203 has finished for PR 15516 at commit c11dcce.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-10-19T21:15:45Z

R/pkg/R/utils.R

+  useOffHeap <- callJMethod(levelObj, "useOffHeap")
+  deserialized <- callJMethod(levelObj, "deserialized")
+  replication <- callJMethod(levelObj, "replication")
+  if (!useDisk && !useMemory && !useOffHeap && !deserialized && replication == 1) {


hardcoding the variations in R could be hard to maintain or easily get out of sync. is there anyway to do this?
Python seems to be able to get the enum name as a string

python has itself StorageLevel class, and the python side code about storageLevel also exists duplicated code problem...
and if we make an r-side StorageLevel class may cause the code more complex and seems won't help solving the duplicated code problem.
What do you think about it ?

and, about the R-side String constant, is there better way to avoid duplicated literal constant in code ? such as "MEMORY_AND_DISK", does we need to define some global vars, such as
MEMORY_AND_DISK_CONSTANT <- "MEMORY_AND_DISK" ?
and where could we put the definition above? if use this way.

I see. Class in R wouldn't help much in this case.
You could have a look up table - check out https://github.com/apache/spark/blob/master/R/pkg/R/types.R and how it is used

SparkQA · 2016-10-21T17:02:29Z

Test build #67342 has started for PR 15516 at commit bedc93f.

felixcheung · 2016-10-21T19:55:54Z

I think you've committed a jar file by accident

WeichenXu123 · 2016-10-22T02:58:38Z

@felixcheung
Remove the unrelated jar file.
and about the String look up table, here seems there are not the mapping relationship between these String constant, so that the code I thinks the code just keep it current status is fine, no need to add some look-up table.

SparkQA · 2016-10-22T03:37:56Z

Test build #67370 has finished for PR 15516 at commit 5af4a07.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-10-24T22:48:59Z

I see. I understand the constraint here. I'd hold for a bit to see if anyone else has any thought on this?

Also, I'd think it would be useful to output both the short name + long description (from toString)
eg.

MEMORY_ONLY_SER - Serialized 1x Replicated

or similar. Perhaps later on we could deprecate the MEMORY_AND_DISK type short names as in Scala or Python.

WeichenXu123 · 2016-10-25T03:59:57Z

@felixcheung @yanboliang thanks!

SparkQA · 2016-10-25T04:39:58Z

Test build #67487 has finished for PR 15516 at commit cbbeb2d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-10-25T20:17:14Z

did you get my comment previously here #15516 (comment)

WeichenXu123 · 2016-10-26T06:08:12Z

@felixcheung update rdname, unpersited-method also updated by the way.

SparkQA · 2016-10-26T06:12:53Z

Test build #67563 has finished for PR 15516 at commit aa56467.

This patch fails some tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-26T07:14:10Z

Test build #67565 has finished for PR 15516 at commit 1977591.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-10-26T20:27:38Z

merged to master.

## What changes were proposed in this pull request? Add storageLevel to DataFrame for SparkR. This is similar to this RP: apache#13780 but in R I do not make a class for `StorageLevel` but add a method `storageToString` ## How was this patch tested? test added. Author: WeichenXu <[email protected]> Closes apache#15516 from WeichenXu123/storageLevel_df_r.

WeichenXu123 changed the title ~~[SPARK-17961][SparkR][SQL] Add storageLevel to Dataset for SparkR~~ [SPARK-17961][SparkR][SQL] Add storageLevel to DataFrame for SparkR Oct 17, 2016

WeichenXu123 force-pushed the storageLevel_df_r branch from 4be3e5f to 75ff834 Compare October 17, 2016 17:26

felixcheung reviewed Oct 18, 2016

View reviewed changes

update

c11dcce

WeichenXu123 force-pushed the storageLevel_df_r branch from 75ff834 to c11dcce Compare October 19, 2016 16:00

felixcheung reviewed Oct 19, 2016

View reviewed changes

update testcase

5af4a07

WeichenXu123 force-pushed the storageLevel_df_r branch from bedc93f to 5af4a07 Compare October 22, 2016 02:59

update

cbbeb2d

update.

1977591

WeichenXu123 force-pushed the storageLevel_df_r branch from aa56467 to 1977591 Compare October 26, 2016 06:39

asfgit closed this in fb0a8a8 Oct 26, 2016

WeichenXu123 deleted the storageLevel_df_r branch November 19, 2016 14:13

[SPARK-17961][SparkR][SQL] Add storageLevel to DataFrame for SparkR #15516

[SPARK-17961][SparkR][SQL] Add storageLevel to DataFrame for SparkR #15516

Uh oh!

Conversation

WeichenXu123 commented Oct 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 17, 2016

Uh oh!

SparkQA commented Oct 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 commented Oct 19, 2016

Uh oh!

SparkQA commented Oct 19, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 21, 2016

Uh oh!

felixcheung commented Oct 21, 2016

Uh oh!

WeichenXu123 commented Oct 22, 2016

Uh oh!

SparkQA commented Oct 22, 2016

Uh oh!

felixcheung commented Oct 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WeichenXu123 commented Oct 25, 2016

Uh oh!

SparkQA commented Oct 25, 2016

Uh oh!

felixcheung commented Oct 25, 2016

Uh oh!

WeichenXu123 commented Oct 26, 2016

Uh oh!

SparkQA commented Oct 26, 2016

Uh oh!

SparkQA commented Oct 26, 2016

Uh oh!

felixcheung commented Oct 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WeichenXu123 commented Oct 17, 2016 •

edited

Loading

felixcheung commented Oct 24, 2016 •

edited

Loading