Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.itertuples #1960

Merged
merged 6 commits into from
Dec 10, 2020
Merged

Conversation

xinrong-meng
Copy link
Contributor

@xinrong-meng xinrong-meng commented Dec 9, 2020

ref #1929

        >>> df = ks.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
        ...                   index=['dog', 'hawk'])
        >>> df
              num_legs  num_wings
        dog          4          0
        hawk         2          2
        >>> for row in df.itertuples():
        ...     print(row)
        ...
        Koalas(Index='dog', num_legs=4, num_wings=0)
        Koalas(Index='hawk', num_legs=2, num_wings=2)

@codecov-io
Copy link

codecov-io commented Dec 9, 2020

Codecov Report

Merging #1960 (dff91d6) into master (01ada38) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1960   +/-   ##
=======================================
  Coverage   94.63%   94.64%           
=======================================
  Files          49       49           
  Lines       10829    10861   +32     
=======================================
+ Hits        10248    10279   +31     
- Misses        581      582    +1     
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100.00% <ø> (ø)
databricks/koalas/frame.py 96.75% <100.00%> (-0.02%) ⬇️
databricks/koalas/missing/series.py 100.00% <0.00%> (ø)
databricks/koalas/missing/groupby.py 100.00% <0.00%> (ø)
databricks/koalas/series.py 96.88% <0.00%> (+0.02%) ⬆️
databricks/koalas/groupby.py 91.50% <0.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01ada38...dff91d6. Read the comment docs.

@@ -1436,6 +1436,103 @@ def extract_kv_from_spark_row(row):
s = pd.Series(v, index=columns, name=k)
yield k, s

def itertuples(self, index: bool = True, name: Optional[str] = "Pandas") -> Iterator:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the parameter name, shall we set it to Koalas by default?

The name is the name of the numedtuple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, too

@xinrong-meng xinrong-meng marked this pull request as ready for review December 10, 2020 00:11
@@ -186,6 +186,39 @@ def test_dataframe_multiindex_names_level(self):
self.assert_eq(kdf[("X", "A")].to_pandas().columns.names, pdf[("X", "A")].columns.names)
self.assert_eq(kdf[("X", "A", "Z")], pdf[("X", "A", "Z")])

def test_itertuples(self):
pdf = pd.DataFrame({"num_legs": [4, 2], "num_wings": [0, 2]}, index=["dog", "hawk"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test with multi-index column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Added.

It behaved as below in pandas:

>>> pdf
origin       CA       WA
info        age children
count color             
1     black   4        0
2     brown   2        2
>>> for row in pdf.itertuples():
...     print(row)
...     
... 
Pandas(Index=(1, 'black'), _1=4, _2=0)
Pandas(Index=(2, 'brown'), _1=2, _2=2)

@HyukjinKwon
Copy link
Member

Looks pretty good

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except a very trivial nit comment and #1960 (comment).

databricks/koalas/frame.py Outdated Show resolved Hide resolved
@xinrong-meng xinrong-meng requested a review from ueshin December 10, 2020 18:40
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Collaborator

ueshin commented Dec 10, 2020

Thanks! merging.

@ueshin ueshin merged commit 02133a8 into databricks:master Dec 10, 2020
@xinrong-meng
Copy link
Contributor Author

Thank you!

@itholic itholic mentioned this pull request Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants