Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterating through groups in a for loop #2014

Closed
chogg opened this issue Jan 21, 2021 · 7 comments
Closed

iterating through groups in a for loop #2014

chogg opened this issue Jan 21, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@chogg
Copy link

chogg commented Jan 21, 2021

I would like to iterate through groups in a dataframe. This is possible in pandas, but when I port this to koalas, I get an error.

import databricks.koalas as ks
import pandas as pd

pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})

# Create a Koalas DataFrame from pandas DataFrame
df = ks.from_pandas(pdf)

for a in df.groupby('x'):
    print(a)

Here is the error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-35-d4164d1f71e0> in <module>
----> 1 for a in df.groupby('x'):
      2     print(a)

/opt/conda/lib/python3.7/site-packages/databricks/koalas/groupby.py in __getitem__(self, item)
   2630         if self._as_index and is_name_like_value(item):
   2631             return SeriesGroupBy(
-> 2632                 self._kdf._kser_for(item if is_name_like_tuple(item) else (item,)),
   2633                 self._groupkeys,
   2634                 dropna=self._dropna,

/opt/conda/lib/python3.7/site-packages/databricks/koalas/frame.py in _kser_for(self, label)
    721         Name: id, dtype: int64
    722         """
--> 723         return self._ksers[label]
    724 
    725     def _apply_series_op(self, op, should_resolve: bool = False):

KeyError: (0,)

Is this kind of group iteration possible in koalas? The koalas documentation kind of implies it is possible - https://koalas.readthedocs.io/en/latest/reference/groupby.html

I don't know why the name of the group key is getting turned into 0. groupby('x').count() does seem to work.

@chogg chogg changed the title iterating through iterating through groups in a for loop Jan 21, 2021
@chogg
Copy link
Author

chogg commented Jan 21, 2021

Is this related: #1770

@ueshin
Copy link
Collaborator

ueshin commented Jan 21, 2021

@chogg Unfortunately, Koalas doesn't support group iterations so far.

@ueshin ueshin added the enhancement New feature or request label Jan 21, 2021
@chogg
Copy link
Author

chogg commented Jan 21, 2021

Thanks for letting me know.

@chogg chogg closed this as completed Jan 21, 2021
@chogg
Copy link
Author

chogg commented Jan 21, 2021

@ueshin Do you have any idea of when this might be supported? Does it take a lot of work to implement?

@chogg chogg reopened this Jan 21, 2021
@itholic
Copy link
Contributor

itholic commented Jan 22, 2021

Thanks for the interest in Koalas, @chogg .

Unfortunately there is no clear development plan right now.

This is because implementing __iter__ can potentially very dangerous, because Koalas deals with large amounts of data.

You can also find the related discussions in the past issue from here and here for an example.

@itholic
Copy link
Contributor

itholic commented Aug 9, 2021

Let me close this, since we don't want to have this feature for now since it's too dangerous as mentioned in #2014 (comment).

@itholic itholic closed this as completed Aug 9, 2021
@DavidCamposDSB
Copy link

Hello! What's the actual status for this? Still not being in consideration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants