Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This proposes adding a note on QueryExecution.toRdd regarding Spark's internal optimization callers would need to indicate.

How was this patch tested?

This patch is a documentation change.

@SparkQA
Copy link

SparkQA commented Feb 18, 2019

Test build #102462 has finished for PR 23822 at commit 493b2cb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Retest this, please

@HeartSaVioR
Copy link
Contributor Author

cc. @cloud-fan

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Feb 18, 2019

Test build #102469 has finished for PR 23822 at commit 493b2cb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

* accessing after iteration. (Calling `collect()` is one of known bad usage.)
* If you want to store these rows into collection, please apply some converter or copy row
* which produces new object per iteration.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeartSaVioR Should we point the users to dataset.rdd method where the conversion is already applied ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's good suggestion for end users (not Spark developers). Will add.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I don't think it's an API though .. technically we don't have to worry about end users.

@SparkQA
Copy link

SparkQA commented Feb 18, 2019

Test build #102484 has finished for PR 23822 at commit 493b2cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* If you want to store these rows into collection, please apply some converter or copy row
* which produces new object per iteration.
* Given QueryExecution is not a public class, end users are discouraged to use this: please
* user `Dataset.rdd` instead which conversion will be applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user -> use
which -> in which
or
which -> where ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding. Applied.

@dilipbiswal
Copy link
Contributor

LGTM with a very minor comment.

@SparkQA
Copy link

SparkQA commented Feb 19, 2019

Test build #102490 has finished for PR 23822 at commit 2f224f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants