Expose spark_column from Series/Index to make it easier to work with Spark Columns. #1438

ueshin · 2020-04-23T00:04:49Z

This PR is exposing spark_column property representing the Series/Index for users who are familiar with Spark functions to make it easier to work with them.

E.g.:

>>> kdf = ks.DataFrame({'a': [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], 'b': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})

>>> from pyspark.sql import functions as F
>>> kdf['greatest'] = F.greatest(kdf.a.spark_column, kdf.b.spark_column)
>>> kdf['least'] = F.least(kdf.a.spark_column, kdf.b.spark_column)
>>> kdf
     a    b  greatest  least
0  1.0  1.0       1.0    1.0
1  1.0  2.0       2.0    1.0
2  1.0  3.0       3.0    1.0
3  2.0  4.0       4.0    2.0
4  2.0  5.0       5.0    2.0
5  2.0  6.0       6.0    2.0

…Spark Columns.

itholic · 2020-04-26T22:30:07Z

Cool !

Expose spark_column from Series/Index to make it easier to work with …

bd15932

…Spark Columns.

ueshin requested a review from HyukjinKwon April 23, 2020 00:04

HyukjinKwon approved these changes Apr 23, 2020

View reviewed changes

HyukjinKwon merged commit 9d97ffc into databricks:master Apr 23, 2020

ueshin deleted the expose_spark_column branch April 23, 2020 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose spark_column from Series/Index to make it easier to work with Spark Columns. #1438

Expose spark_column from Series/Index to make it easier to work with Spark Columns. #1438

ueshin commented Apr 23, 2020

itholic commented Apr 26, 2020

Expose spark_column from Series/Index to make it easier to work with Spark Columns. #1438

Expose spark_column from Series/Index to make it easier to work with Spark Columns. #1438

Conversation

ueshin commented Apr 23, 2020

itholic commented Apr 26, 2020