Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tail to use the resolved copy #1942

Merged
merged 4 commits into from
Dec 2, 2020
Merged

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Dec 1, 2020

When use tail(), the result is not correct after modifying the data.

>>> kdf = ks.DataFrame({"A": [1, 2, 3]})
>>> kdf.tail(2)
   A
1  2
2  3
>>> (kdf + 1).tail(2)
   A
1  2
2  3
>>> (kdf.to_pandas() + 1).tail(2)
   A
1  3
2  4

This should resolve #1941

@itholic itholic changed the title Fix Series.to_frame() to use resolve copy Fix Series.to_frame() to resolve copy Dec 1, 2020
@itholic
Copy link
Contributor Author

itholic commented Dec 1, 2020

I'm not sure maybe should we do this only for Series.tail() when the given column name is not exists in given spark_frame ?

@codecov-io
Copy link

codecov-io commented Dec 1, 2020

Codecov Report

Merging #1942 (fda74df) into master (809ed86) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1942      +/-   ##
==========================================
- Coverage   94.16%   94.15%   -0.01%     
==========================================
  Files          41       41              
  Lines       10055    10055              
==========================================
- Hits         9468     9467       -1     
- Misses        587      588       +1     
Impacted Files Coverage Δ
databricks/koalas/series.py 97.04% <100.00%> (ø)
databricks/koalas/namespace.py 84.02% <0.00%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 809ed86...478ec20. Read the comment docs.

@ueshin
Copy link
Collaborator

ueshin commented Dec 2, 2020

The behavior described in the description is totally expected.

I guess the problem is in DataFrame.tail because it happens even there.

>>> kdf = ks.DataFrame({"A": [1, 2, 3]})
>>> kdf.tail(2)
   A
1  2
2  3
>>> (kdf + 1).tail(2)
   A
1  2
2  3
>>> (kdf.to_pandas() + 1).tail(2)
   A
1  3
2  4

@itholic
Copy link
Contributor Author

itholic commented Dec 2, 2020

@ueshin Thanks! Let me add a related tests and take a look at this again

@itholic itholic changed the title Fix Series.to_frame() to resolve copy Fix tail to resolve copy Dec 2, 2020
@HyukjinKwon HyukjinKwon changed the title Fix tail to resolve copy Fix tail to use the resolved copy Dec 2, 2020
@HyukjinKwon HyukjinKwon merged commit 5c00355 into databricks:master Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tail() cannot resolve new column name after calling apply() method
4 participants