Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when joining on index level #1770

Closed
andyvanyperenAM opened this issue Sep 14, 2020 · 3 comments · Fixed by #1771
Closed

KeyError when joining on index level #1770

andyvanyperenAM opened this issue Sep 14, 2020 · 3 comments · Fixed by #1771
Labels
bug Something isn't working

Comments

@andyvanyperenAM
Copy link

andyvanyperenAM commented Sep 14, 2020

When using the index level (multi-index) as join key, an KeyError is raised.
The multi-index originates for example from a groupby operation in previous steps.
In pandas, it is allowed to join on index keys.

example.

toy_pd = pd.DataFrame(columns = ['day','item','size'], data = [[5, 0, 500],[5, 0, 550],[5, 1, 1500],[5, 1, 700],[5, 1, 900],
                                                               [6, 0, 400],[6, 0, 300],[6, 0, 600], [6, 1, 800],[6, 1, 200],
                                                               [7, 0, 600],[7, 1, 700],[7, 1, 700], [7, 2, 750],[7, 2, 500]])

toy_ks1 = ks.from_pandas(toy_pd).groupby(['day','item']).agg({'size':'mean'})
toy_ks2 = ks.from_pandas(toy_pd).groupby(['day','item']).agg({'size':'mean'})
toy_ks1.join(toy_ks2, on = ['day','item'], rsuffix='r')

results in the error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-36-43ac3f9768b4> in <module>
----> 1 toy_ks1.join(toy_ks2, on = ['day','item'], rsuffix='r')

~\AppData\Local\Continuum\anaconda3\envs\spark_env\lib\site-packages\databricks\koalas\frame.py in join(self, right, on, how, lsuffix, rsuffix)
   6928             )
   6929         if on:
-> 6930             self = self.set_index(on)
   6931             join_kdf = self.merge(
   6932                 right, left_index=True, right_index=True, how=how, suffixes=(lsuffix, rsuffix)

~\AppData\Local\Continuum\anaconda3\envs\spark_env\lib\site-packages\databricks\koalas\frame.py in set_index(self, keys, drop, append, inplace)
   3182         for key in keys:
   3183             if key not in columns:
-> 3184                 raise KeyError(key)
   3185         keys = [key if isinstance(key, tuple) else (key,) for key in keys]
   3186 

KeyError: 'day'

Anyone else with this problem? Is it fundamendal or a bug? Possible workarounds in the first case?

Koalas version 1.2.0
OS: Windows 10
python 3.7.6

@itholic itholic added the bug Something isn't working label Sep 14, 2020
@itholic
Copy link
Contributor

itholic commented Sep 14, 2020

Thanks for the reporting, @andyvanyperenAM !
It seems like bug, let me check !

@andyvanyperenAM
Copy link
Author

Thanks @itholic, I tried the bug fix in your PR and it resolves my issue!

@itholic
Copy link
Contributor

itholic commented Sep 21, 2020

Nice, @andyvanyperenAM .
Good to hear that!! 👍

ueshin pushed a commit that referenced this issue Sep 21, 2020
This should resolve #1770 

```python
>>> toy_pd = pd.DataFrame(columns = ['day','item','size'], data = [[5, 0, 500],[5, 0, 550],[5, 1, 1500],[5, 1, 700],[5, 1, 900],
... [6, 0, 400],[6, 0, 300],[6, 0, 600], [6, 1, 800],[6, 1, 200],
... [7, 0, 600],[7, 1, 700],[7, 1, 700], [7, 2, 750],[7, 2, 500]])

>>> toy_ks1 = ks.from_pandas(toy_pd).groupby(['day','item']).agg({'size':'mean'})
>>> toy_ks2 = ks.from_pandas(toy_pd).groupby(['day','item']).agg({'size':'mean'})

>>> toy_ks1.join(toy_ks2, on = ['day','item'], rsuffix='r')
                 size        sizer
day item
5   1     1033.333333  1033.333333
7   1      700.000000   700.000000
    2      625.000000   625.000000
6   1      500.000000   500.000000
7   0      600.000000   600.000000
6   0      433.333333   433.333333
5   0      525.000000   525.000000
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants