[DataFrame] Refactor GroupBy Methods and Implement Reindex#2101
[DataFrame] Refactor GroupBy Methods and Implement Reindex#2101devin-petersohn merged 23 commits intoray-project:masterfrom
Conversation
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Make _block_partitions a property and move the check to there.
|
Test PASSed. |
|
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Move to just before the DataFrameGroupBy object is used.
python/ray/dataframe/dataframe.py
Outdated
python/ray/dataframe/dataframe.py
Outdated
python/ray/dataframe/groupby.py
Outdated
python/ray/dataframe/groupby.py
Outdated
There was a problem hiding this comment.
You can use utils._reindex_helper to more efficiently reorder the columns/rows. Just make sure you reassign new_df.index or new_df.columns depending on the correct reassignment.
python/ray/dataframe/groupby.py
Outdated
There was a problem hiding this comment.
Same here for utils._reindex_helper
There was a problem hiding this comment.
remove sort_index() from this file on checks
python/ray/dataframe/utils.py
Outdated
There was a problem hiding this comment.
Move this next part to a utils function and call from within the _block_partitions property.
There was a problem hiding this comment.
I can't move it to a property because it depends on axis, but I have moved it to a utils function.
|
Test PASSed. |
|
Test PASSed. |
1a05681 to
ec18852
Compare
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
python/ray/dataframe/utils.py
Outdated
|
|
||
|
|
||
| @ray.remote | ||
| def _deploy_generic_func(func, *args): |
There was a problem hiding this comment.
I don't know that we need this. I see how you're using it, but for now I would just prefer _deploy_func like everything else and pass in a row/column partition.
python/ray/dataframe/dataframe.py
Outdated
| if index is not None: | ||
| old_index = self.index | ||
| new_blocks = np.array([_deploy_generic_func._submit( | ||
| args=(tuple([reindex_helper, old_index, index, 1, |
There was a problem hiding this comment.
For the tuple([...] + block.tolist()) you can just do (...) + tuple(block.tolist()). I think it seems more clear this way.
|
Test PASSed. |
|
Passes on private-travis. Thanks @kunalgosar! |
* master: [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101) Initial Support for Airspeed Velocity (ray-project#2113) Use automatic memory management in Redis modules. (ray-project#1797) [DataFrame] Test bugfixes (ray-project#2111) [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)
* master: Prototype named actors. (ray-project#2129) Update arrow to latest master (ray-project#2100) [DataFrame] Speed up dtypes (ray-project#2118) do not fetch from dead Plasma Manager (ray-project#2116) [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101) Initial Support for Airspeed Velocity (ray-project#2113) Use automatic memory management in Redis modules. (ray-project#1797) [DataFrame] Test bugfixes (ray-project#2111) [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)
Some of the changes in this PR are:
_block_partitionswas 1Ddf.applyanddf.agg