Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,16 @@ class QueryExecution(
prepareForExecution(sparkPlan)
}

/** Internal version of the RDD. Avoids copies and has no schema */
/**
* Internal version of the RDD. Avoids copies and has no schema.
* Note for callers: Spark may apply various optimization including reusing object: this means
* the row is valid only for the iteration it is retrieved. You should avoid storing row and
* accessing after iteration. (Calling `collect()` is one of known bad usage.)
* If you want to store these rows into collection, please apply some converter or copy row
* which produces new object per iteration.
* Given QueryExecution is not a public class, end users are discouraged to use this: please
* use `Dataset.rdd` instead where conversion will be applied.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeartSaVioR Should we point the users to dataset.rdd method where the conversion is already applied ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's good suggestion for end users (not Spark developers). Will add.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I don't think it's an API though .. technically we don't have to worry about end users.

lazy val toRdd: RDD[InternalRow] = executedPlan.execute()

/**
Expand Down