Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose spark_column from Series/Index to make it easier to work with Spark Columns. #1438

Merged
merged 1 commit into from
Apr 23, 2020

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Apr 23, 2020

This PR is exposing spark_column property representing the Series/Index for users who are familiar with Spark functions to make it easier to work with them.

E.g.:

>>> kdf = ks.DataFrame({'a': [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], 'b': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})

>>> from pyspark.sql import functions as F
>>> kdf['greatest'] = F.greatest(kdf.a.spark_column, kdf.b.spark_column)
>>> kdf['least'] = F.least(kdf.a.spark_column, kdf.b.spark_column)
>>> kdf
     a    b  greatest  least
0  1.0  1.0       1.0    1.0
1  1.0  2.0       2.0    1.0
2  1.0  3.0       3.0    1.0
3  2.0  4.0       4.0    2.0
4  2.0  5.0       5.0    2.0
5  2.0  6.0       6.0    2.0

@ueshin ueshin requested a review from HyukjinKwon April 23, 2020 00:04
@HyukjinKwon HyukjinKwon merged commit 9d97ffc into databricks:master Apr 23, 2020
@ueshin ueshin deleted the expose_spark_column branch April 23, 2020 02:38
@itholic
Copy link
Contributor

itholic commented Apr 26, 2020

Cool !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants