Skip to content

Conversation

@liancheng
Copy link
Contributor

Before 1.3.0, SchemaRDD.id works as a unique identifier of each SchemaRDD. In 1.3.0, unlike SchemaRDD, DataFrame is no longer an RDD, and DataFrame.rdd is actually a function which always returns a new RDD instance. Making DataFrame.rdd a lazy val should bring the unique identifier back.

Review on Reviewable

@liancheng
Copy link
Contributor Author

@rxin Is there a good reason that makes DataFrame.rdd have to be a function?

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29399 has started for PR 5265 at commit 7f37d21.

@petro-rudenko
Copy link
Contributor

+1 for this, since for example the caching logic from ml package doesn't work properly.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29399 has finished for PR 5265 at commit 7f37d21.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29399/
Test PASSed.

@liancheng
Copy link
Contributor Author

@petro-rudenko Oops, that's a good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc to say that the RDD is memoized, i.e. once called, even if you change the spark sql configuration, it won't change the plan anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good point.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29484 has started for PR 5265 at commit 7500968.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29484 has finished for PR 5265 at commit 7500968.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29484/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Mar 31, 2015

LGTM.

@asfgit asfgit closed this in d36c5fc Apr 1, 2015
@liancheng liancheng deleted the spark-6608 branch April 1, 2015 13:37
@liancheng
Copy link
Contributor Author

Merged to master. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants