-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support y properly in DataFrame with non-numeric columns with plots #2172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if test pass
be7ac69
to
15ffaba
Compare
632649e
to
15d930c
Compare
15d930c
to
7da53a7
Compare
Codecov Report
@@ Coverage Diff @@
## master #2172 +/- ##
==========================================
- Coverage 95.34% 94.44% -0.90%
==========================================
Files 60 60
Lines 13711 13723 +12
==========================================
- Hits 13073 12961 -112
- Misses 638 762 +124
Continue to review full report at Codecov.
|
…ric columns with plots ### What changes were proposed in this pull request? This PR proposes to port the fix databricks/koalas#2172. ```python ks.DataFrame({'a': [1, 2, 3], 'b':["a", "b", "c"], 'c': [4, 5, 6]}).plot(kind='hist', x='a', y='c', bins=200) ``` **Before:** ``` pyspark.sql.utils.AnalysisException: cannot resolve 'least(min(a), min(b), min(c))' due to data type mismatch: The expressions should all have the same type, got LEAST(bigint, string, bigint).; 'Aggregate [unresolvedalias(least(min(a#1L), min(b#2), min(c#3L)), Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d9484042fb0cc1)), unresolvedalias(greatest(max(a#1L), max(b#2), max(c#3L)), Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d9484042fb0cc1))] +- Project [a#1L, b#2, c#3L] +- Project [__index_level_0__#0L, a#1L, b#2, c#3L, monotonically_increasing_id() AS __natural_order__#8L] +- LogicalRDD [__index_level_0__#0L, a#1L, b#2, c#3L], false ``` **After:** ```python Figure({ 'data': [{'hovertemplate': 'variable=a<br>value=%{text}<br>count=%{y}', 'name': 'a', ... ``` ### Why are the changes needed? To match the behaviour with panadas' and allow users to set `x` and `y` in the DataFrame with non-numeric columns. ### Does this PR introduce _any_ user-facing change? No to end users since the changes is not released yet. Yes to dev as described before. ### How was this patch tested? Manually tested, added a test and tested in notebooks: ![Screen Shot 2021-06-11 at 9 11 25 PM](https://user-images.githubusercontent.com/6477701/121686038-a47a1b80-cafb-11eb-8f8e-8d968db7ebef.png) ![Screen Shot 2021-06-11 at 9 48 58 PM](https://user-images.githubusercontent.com/6477701/121688858-e22c7380-cafe-11eb-9d0a-adcbe560030f.png) Closes #32884 from HyukjinKwon/fix-hist-plot. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
Before:
After:
Notebook tests:
Can be tested here: https://mybinder.org/v2/gh/HyukjinKwon/koalas/fix-hist-plot?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb