Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataFrame.plot.kde() (an alias of 'density') #784

Merged
merged 8 commits into from
Sep 28, 2019

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 17, 2019

This PR implements kde in DataFrame. It reuses Series' implementation. at #767

like Series's kde plot, Since DataFrame's also uses MLlib's KernelDensity API to calculate KDE , so slightly different from pandas but seems good enough either:

스크린샷 2019-09-17 오후 2 24 34

스크린샷 2019-09-17 오후 2 25 18

And also kde is an alias of 'density', you can get exactly same result when you use 'density' rather than 'kde' like below:

스크린샷 2019-09-17 오후 2 27 30

스크린샷 2019-09-17 오후 2 27 09

Multiple columns examples:

스크린샷 2019-09-17 오후 2 55 35

and for each row as Series.plot.kde looks same like below:

스크린샷 2019-09-17 오후 2 57 57

@HyukjinKwon
Copy link
Member

@itholic, can you show an example with more than two columns?

@itholic
Copy link
Contributor Author

itholic commented Sep 17, 2019

@HyukjinKwon Sure, Here is
스크린샷 2019-09-17 오후 2 55 35

and for each row as Series.plot.kde looks same like below:

스크린샷 2019-09-17 오후 2 57 57

@HyukjinKwon
Copy link
Member

@itholic, looks good. Can you fix the tests?

@itholic
Copy link
Contributor Author

itholic commented Sep 21, 2019

@HyukjinKwon i'm still working on it, but these tests are still failed, i'm trying to find out solution.

@ueshin
Copy link
Collaborator

ueshin commented Sep 25, 2019

@HyukjinKwon Let me share the question here, too. #767 (comment) which might relate to this PR.

@HyukjinKwon
Copy link
Member

@itholic can you check ^? I'll check it tomorrow.

@itholic
Copy link
Contributor Author

itholic commented Sep 26, 2019

@HyukjinKwon Sure, i'll check this out. sorry to late reply (it was very busy today 😭 )

@HyukjinKwon
Copy link
Member

There seems an issue about plot comparison. plot image has to be closed for each plot but seems not. So, overlapped plot images are being compared. For instance,

we should compare:

a
b

but now we're comparing:

a1
b1

@itholic
Copy link
Contributor Author

itholic commented Sep 27, 2019

@HyukjinKwon Thanks for review !! i just resolved all that you commented.

@softagram-bot
Copy link

Softagram Impact Report for pull/784 (head commit: d8fbcb8)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

💡 Insights

  • Co-change Alert: You modified frame.py. Often test_dataframe.py (koalas/tests) is modified at the same time.

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@codecov-io
Copy link

codecov-io commented Sep 27, 2019

Codecov Report

Merging #784 into master will decrease coverage by 0.03%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #784      +/-   ##
==========================================
- Coverage   94.29%   94.26%   -0.04%     
==========================================
  Files          32       32              
  Lines        5911     5913       +2     
==========================================
  Hits         5574     5574              
- Misses        337      339       +2
Impacted Files Coverage Δ
databricks/koalas/plot.py 94.04% <66.66%> (-0.26%) ⬇️
databricks/koalas/frame.py 96.76% <66.66%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9186870...d8fbcb8. Read the comment docs.

@HyukjinKwon
Copy link
Member

@ueshin, I am going to rewrite the plot tests soon. Let me merge this for now.

@HyukjinKwon HyukjinKwon merged commit cf4ded8 into databricks:master Sep 28, 2019
@ueshin
Copy link
Collaborator

ueshin commented Sep 28, 2019

Sounds good, thanks!

@HyukjinKwon
Copy link
Member

I made a PR #838

BTW, @itholic, can you add some examples for plot implementations in their docstrings? Basically we finished the rough implementation of all plots (except 2 in DataFrame). So it should be better to have some examples.

Can you also check if we're able to post some images? e.g. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html#pandas.DataFrame.plot.box

@itholic
Copy link
Contributor Author

itholic commented Sep 28, 2019

@HyukjinKwon sure, I'm going to add & check them 😃

@itholic itholic deleted the imple_frame_kde branch September 30, 2019 01:13
@@ -243,4 +243,6 @@ specific plotting methods of the form ``DataFrame.plot.<kind>``.
DataFrame.plot.line
DataFrame.plot.pie
DataFrame.plot.scatter
DataFrame.hist
DataFrame.plot.hist
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like DataFrame.plot.hist is listed twice.
I'm not sure we should keep DataFrame.hist, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic can you fix this? DataFrame.hist seems in the pandas doc as an alias. Let's keep it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, fixed in #851
Thanks for comment !!

@itholic itholic restored the imple_frame_kde branch October 1, 2019 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants