Skip to content

Conversation

@BryanCutler
Copy link
Member

Adding blog post to highlight some of the work done in integrating Arrow with Spark for toPandas()

@BryanCutler
Copy link
Member Author

CC @wesm @xhochy

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, minor comments, but can push this out soon

to apply a function on grouped data using a Pandas DataFrame ([SPARK-20396][9]).
Just as Arrow helped in converting a Spark to Pandas, it can also work in the
other direction when creating a Spark DataFrame from an existing Pandas
DataFrame ([SPARK-20791][10]). Stay tuned for more!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to acknowledge the other collaborators on this work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes definitely, thanks for pointing that out!

@@ -0,0 +1,149 @@
---
layout: post
title: "Spark, Meet Arrow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Speeding up PySpark with Apache Arrow" ?

[3]: https://issues.apache.org/jira/issues/?filter=12335725&jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22arrow%22%20ORDER%20BY%20createdDate%20DESC
[4]: https://gist.github.com/wesm/0cb5531b1c2e346a0007
[5]: https://issues.apache.org/jira/browse/SPARK-13534
[6]: https://github.com/apache/arrow/blob/apache-arrow-0.4.1/site/install.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this version pinned on purpose?

@BryanCutler BryanCutler force-pushed the spark-blogpost-ARROW-1268 branch from 486ec3a to 1f8dffd Compare July 27, 2017 00:31
@BryanCutler
Copy link
Member Author

Please take another look when you can @wesm , let me know if you think anything else needs changes. Thanks!

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks! I will deploy and tweet out

@asfgit asfgit closed this in d76e43e Jul 27, 2017
@BryanCutler
Copy link
Member Author

Thanks @wesm!

@BryanCutler BryanCutler deleted the spark-blogpost-ARROW-1268 branch November 7, 2017 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants