[SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val #5265

liancheng · 2015-03-30T12:37:12Z

Before 1.3.0, SchemaRDD.id works as a unique identifier of each SchemaRDD. In 1.3.0, unlike SchemaRDD, DataFrame is no longer an RDD, and DataFrame.rdd is actually a function which always returns a new RDD instance. Making DataFrame.rdd a lazy val should bring the unique identifier back.

liancheng · 2015-03-30T12:37:49Z

@rxin Is there a good reason that makes DataFrame.rdd have to be a function?

SparkQA · 2015-03-30T12:42:37Z

Test build #29399 has started for PR 5265 at commit 7f37d21.

petro-rudenko · 2015-03-30T13:06:56Z

+1 for this, since for example the caching logic from ml package doesn't work properly.

SparkQA · 2015-03-30T14:05:14Z

Test build #29399 has finished for PR 5265 at commit 7f37d21.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-03-30T14:05:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29399/
Test PASSed.

liancheng · 2015-03-30T14:50:17Z

@petro-rudenko Oops, that's a good catch!

rxin · 2015-03-31T04:52:12Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

can you update the doc to say that the RDD is memoized, i.e. once called, even if you change the spark sql configuration, it won't change the plan anymore?

Thanks, that's a good point.

SparkQA · 2015-03-31T15:28:18Z

Test build #29484 has started for PR 5265 at commit 7500968.

SparkQA · 2015-03-31T16:52:02Z

Test build #29484 has finished for PR 5265 at commit 7500968.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-03-31T16:52:05Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29484/
Test PASSed.

rxin · 2015-03-31T17:26:20Z

LGTM.

liancheng · 2015-04-01T13:37:18Z

Merged to master. Thanks for the review!

Makes DataFrame.rdd a lazy val

7f37d21

rxin reviewed Mar 31, 2015
View reviewed changes

Updates javadoc

7500968

asfgit closed this in d36c5fc Apr 1, 2015

liancheng deleted the spark-6608 branch April 1, 2015 13:37

cloud-fan mentioned this pull request Apr 28, 2015

[SPARK-7158] [SQL] Fix bug of cached data cannot be used in collect() after cache() #5714

Closed

[SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val #5265

[SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val #5265

Uh oh!

Conversation

liancheng commented Mar 30, 2015

Uh oh!

liancheng commented Mar 30, 2015

Uh oh!

SparkQA commented Mar 30, 2015

Uh oh!

petro-rudenko commented Mar 30, 2015

Uh oh!

SparkQA commented Mar 30, 2015

Uh oh!

AmplabJenkins commented Mar 30, 2015

Uh oh!

liancheng commented Mar 30, 2015

Uh oh!

rxin Mar 31, 2015

Choose a reason for hiding this comment

Uh oh!

liancheng Mar 31, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 31, 2015

Uh oh!

SparkQA commented Mar 31, 2015

Uh oh!

AmplabJenkins commented Mar 31, 2015

Uh oh!

rxin commented Mar 31, 2015

Uh oh!

liancheng commented Apr 1, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants