You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experimenting with Koalas. My pandas dataframes use MultiIndex both for rows and columns. Such pandas dataframes can be saved to / loaded from parquet files using PyArrow. Koalas can successfully translate such dataframes to/from pandas. However, Koalas cannot save/load such dataframes directly to/from parquet. Having to go through Pandas just to load/store the data severely limits the supported data size and kind of defeats the purpose of using Koalas.
PyArrow stores the information necessary to reconstruct MultiIndex in parquet metadata. It would be nice to have Koalas use the same approach for better compatibility, maybe even reuse PyArrow lib. Pointers to PyArrow implementation:
Right now Koalas supports MultiIndex save/load for rows, but it requires specifying index_col parameter for each to_parquet()/read_parquet() call, which is inferior to PyArrow approach.
The text was updated successfully, but these errors were encountered:
I'm experimenting with Koalas. My pandas dataframes use MultiIndex both for rows and columns. Such pandas dataframes can be saved to / loaded from parquet files using PyArrow. Koalas can successfully translate such dataframes to/from pandas. However, Koalas cannot save/load such dataframes directly to/from parquet. Having to go through Pandas just to load/store the data severely limits the supported data size and kind of defeats the purpose of using Koalas.
PyArrow stores the information necessary to reconstruct MultiIndex in parquet metadata. It would be nice to have Koalas use the same approach for better compatibility, maybe even reuse PyArrow lib. Pointers to PyArrow implementation:
Right now Koalas supports MultiIndex save/load for rows, but it requires specifying
index_col
parameter for eachto_parquet()
/read_parquet()
call, which is inferior to PyArrow approach.The text was updated successfully, but these errors were encountered: