[DataFrame] Fix blocking issue on _IndexMetadata passing #1965

p-yang · 2018-04-28T21:35:22Z

What do these changes do?

Fixes a performance issue on passing _IndexMetadata objects for applymap and related functions that don't mutate indexes.

AmplabJenkins · 2018-04-28T22:40:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5108/
Test PASSed.

AmplabJenkins · 2018-04-28T23:01:59Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5112/
Test FAILed.

devin-petersohn

It would be great to get numbers on the performance difference here.

devin-petersohn · 2018-04-29T04:24:03Z

python/ray/dataframe/dataframe.py

Are you planning to do this on this PR?

devin-petersohn · 2018-04-29T04:24:14Z

python/ray/dataframe/dataframe.py

What is the bp in bp_length?

devin-petersohn · 2018-04-29T04:25:10Z

python/ray/dataframe/dataframe.py

can we get rid of this line since we only need the metadata now?

We can't in this case, as discussed today. If we don't pass the columns then the metadata of the new dataframe won't have the column changes reflected. Therefore, we need to either copy the metadata and modify the copy and push the copy, or pass the new columns such that the constructor modifies the metadata object copy on its end.

devin-petersohn · 2018-04-29T04:25:24Z

python/ray/dataframe/dataframe.py

Same as above, can we get rid of this line?

Same as above.

devin-petersohn · 2018-04-29T04:32:16Z

Follow-up question: Can we make it so that index is cached in this PR?

p-yang · 2018-04-29T21:40:54Z

Reference on performance numbers (these are off the top of my head, and testing on c69 yielded results with high variance):

On c69 limited to 8 ray workers, 5GB int64 CSV df.isna:

With 1x6 partitions: pre-change 300ms, post-change 182ms
With 2x6 partitions: pre-change 600ms, post-change 186ms

AmplabJenkins · 2018-04-29T23:56:22Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5118/
Test PASSed.

AmplabJenkins · 2018-04-30T02:08:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5119/
Test PASSed.

devin-petersohn

These changes look really great! Just a couple of minor quick comments.

devin-petersohn · 2018-04-30T16:23:56Z

python/ray/dataframe/utils.py

Add doc """Compute widths for each partition."""

devin-petersohn · 2018-04-30T16:26:19Z

python/ray/dataframe/index_metadata.py

Are you planning to add this fix in this PR?

This was actually silently fixed by the other changes, setting a new DF will implicitly change the index on the metadata object as well. Comment removed.

devin-petersohn · 2018-04-30T16:28:23Z

python/ray/dataframe/index_metadata.py

Could you write a comment about how to use __getitem__? We have had people using it in incorrect ways and trying to make it work for them, so this way hopefully we can avoid that.

devin-petersohn · 2018-05-02T04:15:51Z

Jenkins, retest this please

AmplabJenkins · 2018-05-02T05:21:04Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5141/
Test PASSed.

devin-petersohn · 2018-05-02T06:27:58Z

Merged, thanks @Veryku

* magic-methods: fmt Fix IndentationError Write magic methods for SampleBatch/PartialRollout Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) [DataFrame] Fix blocking issue on _IndexMetadata passing (ray-project#1965)

devin-petersohn reviewed Apr 29, 2018

View reviewed changes

devin-petersohn reviewed Apr 30, 2018

View reviewed changes

p-yang added 8 commits May 1, 2018 00:13

metadata passing fixes

b554561

fix flake8

3c53582

fix test failures

6734e78

overhaul indexmetadata

b86fa56

variable name change

564ad1b

optimization for building coord df

4052bb4

addressing comments

ef5207e

subtle bug fixes

7508904

p-yang force-pushed the metadata_fix branch from b9116e2 to 7508904 Compare May 1, 2018 08:26

p-yang mentioned this pull request May 1, 2018

[DataFrame] Apply() for Lists and Dicts #1973

Merged

devin-petersohn approved these changes May 1, 2018

View reviewed changes

devin-petersohn merged commit 5589426 into ray-project:master May 2, 2018

devin-petersohn mentioned this pull request May 2, 2018

[DataFrame] _Index_MetaData cannot be instantiated with an OID. #1983

Closed

[DataFrame] Fix blocking issue on _IndexMetadata passing #1965

[DataFrame] Fix blocking issue on _IndexMetadata passing #1965

Uh oh!

Conversation

p-yang commented Apr 28, 2018

What do these changes do?

Uh oh!

AmplabJenkins commented Apr 28, 2018

Uh oh!

AmplabJenkins commented Apr 28, 2018

Uh oh!

devin-petersohn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-petersohn commented Apr 29, 2018

Uh oh!

p-yang commented Apr 29, 2018

Uh oh!

AmplabJenkins commented Apr 29, 2018

Uh oh!

AmplabJenkins commented Apr 30, 2018

Uh oh!

devin-petersohn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-petersohn commented May 2, 2018

Uh oh!

AmplabJenkins commented May 2, 2018

Uh oh!

devin-petersohn commented May 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants