Skip to content

Commit d36c5fc

Browse files
committed
[SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val
Before 1.3.0, `SchemaRDD.id` works as a unique identifier of each `SchemaRDD`. In 1.3.0, unlike `SchemaRDD`, `DataFrame` is no longer an RDD, and `DataFrame.rdd` is actually a function which always returns a new RDD instance. Making `DataFrame.rdd` a lazy val should bring the unique identifier back. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5265) <!-- Reviewable:end --> Author: Cheng Lian <[email protected]> Closes #5265 from liancheng/spark-6608 and squashes the following commits: 7500968 [Cheng Lian] Updates javadoc 7f37d21 [Cheng Lian] Makes DataFrame.rdd a lazy val
1 parent 0358b08 commit d36c5fc

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -952,10 +952,12 @@ class DataFrame private[sql](
952952
/////////////////////////////////////////////////////////////////////////////
953953

954954
/**
955-
* Returns the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s.
955+
* Represents the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s. Note that the RDD is
956+
* memoized. Once called, it won't change even if you change any query planning related Spark SQL
957+
* configurations (e.g. `spark.sql.shuffle.partitions`).
956958
* @group rdd
957959
*/
958-
def rdd: RDD[Row] = {
960+
lazy val rdd: RDD[Row] = {
959961
// use a local variable to make sure the map closure doesn't capture the whole DataFrame
960962
val schema = this.schema
961963
queryExecution.executedPlan.execute().map(ScalaReflection.convertRowToScala(_, schema))

0 commit comments

Comments
 (0)