Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support y properly in DataFrame with non-numeric columns with plots #2172

Merged
merged 1 commit into from
Jun 11, 2021

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 11, 2021

ks.DataFrame({'a': [1, 2, 3], 'b':["a", "b", "c"], 'c': [4, 5, 6]}).plot(kind="hist", x="a", y="c", bins=200)

Before:

pyspark.sql.utils.AnalysisException: cannot resolve 'least(min(a), min(b), min(c))' due to data type mismatch: The expressions should all have the same type, got LEAST(bigint, string, bigint).;
'Aggregate [unresolvedalias(least(min(a#1L), min(b#2), min(c#3L)), Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d94840@42fb0cc1)), unresolvedalias(greatest(max(a#1L), max(b#2), max(c#3L)), Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d94840@42fb0cc1))]
+- Project [a#1L, b#2, c#3L]
   +- Project [__index_level_0__#0L, a#1L, b#2, c#3L, monotonically_increasing_id() AS __natural_order__#8L]
      +- LogicalRDD [__index_level_0__#0L, a#1L, b#2, c#3L], false

After:

Figure({
    'data': [{'hovertemplate': 'variable=a<br>value=%{text}<br>count=%{y}',
              'name': 'a',
...

Notebook tests:

Screen Shot 2021-06-11 at 9 11 25 PM

Screen Shot 2021-06-11 at 9 48 58 PM

Can be tested here: https://mybinder.org/v2/gh/HyukjinKwon/koalas/fix-hist-plot?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if test pass

@HyukjinKwon HyukjinKwon changed the title Support x and y properly in plots (both matplotlib and plotly) Support x and y properly in DataFrame with non-numeric columns with plots (both matplotlib and plotly) Jun 11, 2021
@HyukjinKwon HyukjinKwon force-pushed the fix-hist-plot branch 2 times, most recently from 632649e to 15d930c Compare June 11, 2021 12:40
@HyukjinKwon HyukjinKwon changed the title Support x and y properly in DataFrame with non-numeric columns with plots (both matplotlib and plotly) Support x and y properly in DataFrame with non-numeric columns with plots Jun 11, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2021

Codecov Report

Merging #2172 (7da53a7) into master (eda0bb5) will decrease coverage by 0.89%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2172      +/-   ##
==========================================
- Coverage   95.34%   94.44%   -0.90%     
==========================================
  Files          60       60              
  Lines       13711    13723      +12     
==========================================
- Hits        13073    12961     -112     
- Misses        638      762     +124     
Impacted Files Coverage Δ
databricks/koalas/plot/plotly.py 95.95% <75.00%> (-0.89%) ⬇️
...ks/koalas/tests/plot/test_frame_plot_matplotlib.py 100.00% <100.00%> (ø)
databricks/koalas/usage_logging/__init__.py 27.27% <0.00%> (-65.29%) ⬇️
databricks/koalas/usage_logging/usage_logger.py 47.82% <0.00%> (-52.18%) ⬇️
databricks/conftest.py 93.75% <0.00%> (-6.25%) ⬇️
databricks/koalas/typedef/typehints.py 89.28% <0.00%> (-6.13%) ⬇️
databricks/koalas/__init__.py 85.36% <0.00%> (-3.66%) ⬇️
databricks/koalas/tests/indexes/test_category.py 98.21% <0.00%> (-1.79%) ⬇️
databricks/koalas/tests/indexes/test_datetime.py 98.27% <0.00%> (-1.73%) ⬇️
databricks/koalas/indexes/datetimes.py 95.91% <0.00%> (-1.37%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eda0bb5...7da53a7. Read the comment docs.

@HyukjinKwon HyukjinKwon changed the title Support x and y properly in DataFrame with non-numeric columns with plots Support y properly in DataFrame with non-numeric columns with plots Jun 11, 2021
@HyukjinKwon HyukjinKwon merged commit f971143 into databricks:master Jun 11, 2021
HyukjinKwon added a commit to apache/spark that referenced this pull request Jun 12, 2021
…ric columns with plots

### What changes were proposed in this pull request?

This PR proposes to port the fix databricks/koalas#2172.

```python
ks.DataFrame({'a': [1, 2, 3], 'b':["a", "b", "c"], 'c': [4, 5, 6]}).plot(kind='hist', x='a', y='c', bins=200)
```

**Before:**

```
pyspark.sql.utils.AnalysisException: cannot resolve 'least(min(a), min(b), min(c))' due to data type mismatch: The expressions should all have the same type, got LEAST(bigint, string, bigint).;
'Aggregate [unresolvedalias(least(min(a#1L), min(b#2), min(c#3L)), Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d9484042fb0cc1)), unresolvedalias(greatest(max(a#1L), max(b#2), max(c#3L)), Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d9484042fb0cc1))]
+- Project [a#1L, b#2, c#3L]
   +- Project [__index_level_0__#0L, a#1L, b#2, c#3L, monotonically_increasing_id() AS __natural_order__#8L]
      +- LogicalRDD [__index_level_0__#0L, a#1L, b#2, c#3L], false
```

**After:**

```python
Figure({
    'data': [{'hovertemplate': 'variable=a<br>value=%{text}<br>count=%{y}',
              'name': 'a',
...
```

### Why are the changes needed?

To match the behaviour with panadas' and allow users to set `x` and `y` in the DataFrame with non-numeric columns.

### Does this PR introduce _any_ user-facing change?

No to end users since the changes is not released yet. Yes to dev as described before.

### How was this patch tested?

Manually tested, added a test and tested in notebooks:

![Screen Shot 2021-06-11 at 9 11 25 PM](https://user-images.githubusercontent.com/6477701/121686038-a47a1b80-cafb-11eb-8f8e-8d968db7ebef.png)

![Screen Shot 2021-06-11 at 9 48 58 PM](https://user-images.githubusercontent.com/6477701/121688858-e22c7380-cafe-11eb-9d0a-adcbe560030f.png)

Closes #32884 from HyukjinKwon/fix-hist-plot.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants