Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.insert #1983

Merged
merged 25 commits into from
Jan 20, 2021
Merged

Implement DataFrame.insert #1983

merged 25 commits into from
Jan 20, 2021

Conversation

xinrong-meng
Copy link
Contributor

@xinrong-meng xinrong-meng commented Dec 23, 2020

ref #1929

Insert column into DataFrame at a specified location.

        >>> kdf = ks.DataFrame([1, 2, 3])
        >>> kdf.insert(0, 'x', 4)
        >>> kdf.sort_index()
           x  0
        0  4  1
        1  4  2
        2  4  3

        >>> from databricks.koalas.config import set_option, reset_option
        >>> set_option("compute.ops_on_diff_frames", True)

        >>> kdf.insert(1, 'y', [5, 6, 7])
        >>> kdf.sort_index()
           x  y  0
        0  4  5  1
        1  4  6  2
        2  4  7  3

        >>> kdf.insert(2, 'z', ks.Series([8, 9, 10]))
        >>> kdf.sort_index()
           x  y   z  0
        0  4  5   8  1
        1  4  6   9  2
        2  4  7  10  3

        >>> reset_option("compute.ops_on_diff_frames")

@xinrong-meng xinrong-meng changed the title Implement DataFrrame.insert Implement DataFrame.insert Dec 23, 2020
@xinrong-meng xinrong-meng marked this pull request as ready for review January 5, 2021 15:47
Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a more closer look after passing the tests :)

databricks/koalas/tests/test_ops_on_diff_frames.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Jan 12, 2021

Codecov Report

Merging #1983 (a132eb1) into master (c38c96f) will decrease coverage by 0.00%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1983      +/-   ##
==========================================
- Coverage   94.59%   94.59%   -0.01%     
==========================================
  Files          50       50              
  Lines       11194    11210      +16     
==========================================
+ Hits        10589    10604      +15     
- Misses        605      606       +1     
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100.00% <ø> (ø)
databricks/koalas/frame.py 96.56% <94.11%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c38c96f...a132eb1. Read the comment docs.

@xinrong-meng xinrong-meng requested a review from itholic January 12, 2021 21:11
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some bugs:

>>> kdf = ks.DataFrame([1, 2, 3], index=[10,20,30])
>>> kdf.insert(1, 'z', ks.Series([8, 9, 10]))
>>> kdf
      0     z
0   NaN   8.0
1   NaN   9.0
10  1.0   NaN
2   NaN  10.0
30  3.0   NaN
20  2.0   NaN

>>> pdf = pd.DataFrame([1, 2, 3], index=[10,20,30])
>>> pdf.insert(1, 'z', pd.Series([8, 9, 10]))
>>> pdf
    0   z
10  1 NaN
20  2 NaN
30  3 NaN

or

>>> kdf = ks.DataFrame({('x', 'a'): [1, 2, 3]})
>>> kdf.insert(1, 'z', ks.Series([8, 9, 10]))
Traceback (most recent call last):
...
AssertionError: [('x', 'a'), ('z',)]

>>> pdf = pd.DataFrame({('x', 'a'): [1, 2, 3]})
>>> pdf.insert(1, 'z', pd.Series([8, 9, 10]))
>>> pdf
   x   z
   a
0  1   8
1  2   9
2  3  10

databricks/koalas/tests/test_dataframe.py Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
kdf = kdf[columns]
self._update_internal_frame(kdf._internal)

# TODO: add frep and axis parqameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we revert this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking kdf and self should share the same anchor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this line # TODO: add frep and axis parqameter

Copy link
Collaborator

@ueshin ueshin Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw

I was thinking kdf and self should share the same anchor.

kdf could have a different anchor after assigning the value if the value is not the same anchor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, may I get some help understand requires_same_anchor parameter of _update_internal_frame ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's False, the existing linked Series will see the updates even when the anchor changes; otherwise the link will be disconnected.
See #1592 for more detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference!

No matter what requires_same_anchor is, the test case doesn't seem to pass https://github.com/databricks/koalas/pull/1983/files#diff-028bf26a42786beb47e6707fe34867dc720d3279ae4c942226abc3eb40f26eeaR104.
Would you give me a clue?

@xinrong-meng xinrong-meng requested a review from ueshin January 20, 2021 17:29
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_ops_on_diff_frames.py Outdated Show resolved Hide resolved
@ueshin
Copy link
Collaborator

ueshin commented Jan 20, 2021

Thanks! merging.

@ueshin ueshin merged commit 8803344 into databricks:master Jan 20, 2021
@xinrong-meng
Copy link
Contributor Author

@ueshin Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants