-
Notifications
You must be signed in to change notification settings - Fork 7k
[DataFrame] Fix blocking issue on _IndexMetadata passing #1965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test PASSed. |
|
Test FAILed. |
devin-petersohn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to get numbers on the performance difference here.
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to do this on this PR?
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the bp in bp_length?
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get rid of this line since we only need the metadata now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't in this case, as discussed today. If we don't pass the columns then the metadata of the new dataframe won't have the column changes reflected. Therefore, we need to either copy the metadata and modify the copy and push the copy, or pass the new columns such that the constructor modifies the metadata object copy on its end.
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, can we get rid of this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
|
Follow-up question: Can we make it so that |
|
Reference on performance numbers (these are off the top of my head, and testing on On
|
|
Test PASSed. |
|
Test PASSed. |
devin-petersohn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look really great! Just a couple of minor quick comments.
python/ray/dataframe/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc """Compute widths for each partition."""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to add this fix in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually silently fixed by the other changes, setting a new DF will implicitly change the index on the metadata object as well. Comment removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write a comment about how to use __getitem__? We have had people using it in incorrect ways and trying to make it work for them, so this way hopefully we can avoid that.
|
Jenkins, retest this please |
|
Test PASSed. |
|
Merged, thanks @Veryku |
* magic-methods: fmt Fix IndentationError Write magic methods for SampleBatch/PartialRollout Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) [DataFrame] Fix blocking issue on _IndexMetadata passing (ray-project#1965)
What do these changes do?
Fixes a performance issue on passing
_IndexMetadataobjects for applymap and related functions that don't mutate indexes.