Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Oct 17, 2016

What changes were proposed in this pull request?

Add storageLevel to DataFrame for SparkR.
This is similar to this RP: #13780

but in R I do not make a class for StorageLevel
but add a method storageToString

How was this patch tested?

test added.

@WeichenXu123 WeichenXu123 changed the title [SPARK-17961][SparkR][SQL] Add storageLevel to Dataset for SparkR [SPARK-17961][SparkR][SQL] Add storageLevel to DataFrame for SparkR Oct 17, 2016
@SparkQA
Copy link

SparkQA commented Oct 17, 2016

Test build #67078 has finished for PR 15516 at commit 4be3e5f.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 17, 2016

Test build #67080 has finished for PR 15516 at commit 75ff834.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this storageLevel - to match the method name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung
Here I'am a little confusing, the method name is storageLevel does it need to change to something else ? or the doc where need to update but I forgot ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be
@rdname storageLevel instead of
@rdname storageLevel-methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the output of this doesn't say "MEMORY_AND_DISK"? Should we have that in addition to "StorageLevel(disk, memory, deserialized, 1 replicas)"? It might be confusing to set "MEMORY_AND_DISK" and get "StorageLevel(disk, memory, deserialized, 1 replicas)" back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, I'll update the code later. thanks!

@WeichenXu123
Copy link
Contributor Author

@felixcheung code updated. thanks!

@SparkQA
Copy link

SparkQA commented Oct 19, 2016

Test build #67203 has finished for PR 15516 at commit c11dcce.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

R/pkg/R/utils.R Outdated
useOffHeap <- callJMethod(levelObj, "useOffHeap")
deserialized <- callJMethod(levelObj, "deserialized")
replication <- callJMethod(levelObj, "replication")
if (!useDisk && !useMemory && !useOffHeap && !deserialized && replication == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoding the variations in R could be hard to maintain or easily get out of sync. is there anyway to do this?
Python seems to be able to get the enum name as a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python has itself StorageLevel class, and the python side code about storageLevel also exists duplicated code problem...
and if we make an r-side StorageLevel class may cause the code more complex and seems won't help solving the duplicated code problem.
What do you think about it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and, about the R-side String constant, is there better way to avoid duplicated literal constant in code ? such as "MEMORY_AND_DISK", does we need to define some global vars, such as
MEMORY_AND_DISK_CONSTANT <- "MEMORY_AND_DISK" ?
and where could we put the definition above? if use this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Class in R wouldn't help much in this case.
You could have a look up table - check out https://github.com/apache/spark/blob/master/R/pkg/R/types.R and how it is used

@SparkQA
Copy link

SparkQA commented Oct 21, 2016

Test build #67342 has started for PR 15516 at commit bedc93f.

@felixcheung
Copy link
Member

I think you've committed a jar file by accident

@WeichenXu123
Copy link
Contributor Author

@felixcheung
Remove the unrelated jar file.
and about the String look up table, here seems there are not the mapping relationship between these String constant, so that the code I thinks the code just keep it current status is fine, no need to add some look-up table.

@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67370 has finished for PR 15516 at commit 5af4a07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

felixcheung commented Oct 24, 2016

I see. I understand the constraint here. I'd hold for a bit to see if anyone else has any thought on this?

Also, I'd think it would be useful to output both the short name + long description (from toString)
eg.

MEMORY_ONLY_SER - Serialized 1x Replicated

or similar. Perhaps later on we could deprecate the MEMORY_AND_DISK type short names as in Scala or Python.

@WeichenXu123
Copy link
Contributor Author

@felixcheung @yanboliang thanks!

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67487 has finished for PR 15516 at commit cbbeb2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

did you get my comment previously here #15516 (comment)

@WeichenXu123
Copy link
Contributor Author

@felixcheung update rdname, unpersited-method also updated by the way.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67563 has finished for PR 15516 at commit aa56467.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67565 has finished for PR 15516 at commit 1977591.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

merged to master.

@asfgit asfgit closed this in fb0a8a8 Oct 26, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

Add storageLevel to DataFrame for SparkR.
This is similar to this RP:  apache#13780

but in R I do not make a class for `StorageLevel`
but add a method `storageToString`

## How was this patch tested?

test added.

Author: WeichenXu <[email protected]>

Closes apache#15516 from WeichenXu123/storageLevel_df_r.
@WeichenXu123 WeichenXu123 deleted the storageLevel_df_r branch November 19, 2016 14:13
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Add storageLevel to DataFrame for SparkR.
This is similar to this RP:  apache#13780

but in R I do not make a class for `StorageLevel`
but add a method `storageToString`

## How was this patch tested?

test added.

Author: WeichenXu <[email protected]>

Closes apache#15516 from WeichenXu123/storageLevel_df_r.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants