Version 0.32.0
Koalas documentation redesign
Koalas documentation was redesigned with a better theme, pydata-sphinx-theme. Please check the new Koalas documentation site out.
transform_batch
and apply_batch
We added the APIs that enable you to directly transform and apply a function against Koalas Series or DataFrame. map_in_pandas
is deprecated and now renamed to apply_batch
.
import databricks.koalas as ks
kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
def pandas_plus(pdf):
return pdf + 1 # should always return the same length as input.
kdf.transform_batch(pandas_plus)
import databricks.koalas as ks
kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
def pandas_plus(pdf):
return pdf[pdf.a > 1] # allow arbitrary length
kdf.apply_batch(pandas_plus)
Please also check Transform and apply a function in Koalas documentation.
Other new features and improvements
We added the following new feature:
DataFrame:
SeriesGroupBy:
unique
(#1426)
Index:
spark_column
(#1438)
Series:
spark_column
(#1438)
MultiIndex:
spark_column
(#1438)
Other improvements
- Fix from_pandas to handle the same index name as a column name. (#1419)
- Add documentation about non-Koalas APIs (#1420)
- Hot-fixing the lack of keyword argument 'deep' for DataFrame.copy() (#1423)
- Fix Series.div when divide by zero (#1412)
- Support expand parameter if n is a positive integer in Series.str.split/rsplit. (#1432)
- Make Series.astype(bool) follow the concept of "truthy" and "falsey". (#1431)
- Fix incompatible behaviour with pandas for floordiv with np.nan (#1429)
- Use mapInPandas for apply_batch API in Spark 3.0 (#1440)
- Use F.datediff() for subtraction of dates as a workaround. (#1439)