Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting index name / names for Series #1079

Merged
merged 7 commits into from
Dec 6, 2019

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Nov 26, 2019

Unlike pandas, koalas.Series can't set the index name like below.

>>> pser = pd.Series([1, 2, 3, 4, 5])
>>> pser.index.name = 'koalas'
>>> pser.index.name
'koalas'

>>> kser = ks.Series([1, 2, 3, 4, 5])
>>> kser.index.name = 'koalas'
>>> kser.index.name

For MultiIndex also

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> pser = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
>>> pser.index.names
FrozenList([None, None])
>>> pser.index.names = ['hello', 'koalas']
>>> pser.index.names
FrozenList(['hello', 'koalas'])

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
>>> kser.index.names
[None, None]
>>> kser.index.names = ['hello', 'koalas']
>>> kser.index.names
[None, None]

So, this PR suggests that make ours possible also.

>>> kser = ks.Series([1, 2, 3, 4, 5])
>>> kser.index.name = 'koalas'
>>> kser.index.name
'koalas'

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
>>> kser.index.names
[None, None]
>>> kser.index.names = ['hello', 'koalas']
>>> kser.index.names
['hello', 'koalas']

@itholic itholic changed the title setting index name for Series Setting index name / names for Series Nov 26, 2019
@codecov-io
Copy link

codecov-io commented Nov 26, 2019

Codecov Report

Merging #1079 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1079   +/-   ##
=======================================
  Coverage   95.14%   95.14%           
=======================================
  Files          35       35           
  Lines        6958     6958           
=======================================
  Hits         6620     6620           
  Misses        338      338
Impacted Files Coverage Δ
databricks/koalas/series.py 96.5% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bd7adc...8b6204c. Read the comment docs.

@@ -958,13 +958,11 @@ def rename(self, index: Union[str, Tuple[str, ...]] = None, **kwargs):
def index(self):
"""The index (axis labels) Column of the Series.

Currently not supported when the DataFrame has no index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

@softagram-bot
Copy link

Softagram Impact Report for pull/1079 (head commit: 8b6204c)

⚠️ Copy paste found

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 302, 319:

                     pd.Series([True, False], name='x'),
                     pd.Series([0, 1], name='x'),
                     pd.Series([1, 2,...(truncated 330 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 746, 887:

        midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2]...(truncated 280 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 747, 888, 924:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 256 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 718, 921:


        # For MultiIndex
        midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
                              ['speed', 'weight', 'length']],
                             [[...(truncated 167 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 721, 747, 888:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 117 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 170, 180:

        pdf = pd.DataFrame({
            'left':  [True, False, True, False, np.nan, np.nan, True, False, np.nan],
            'right': [True, False, False, True, True, False, n...(truncated 119 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 749, 790, 890, 926:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
             ...(truncated 137 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 914, 934:


        self.assert_eq(kser.pct_change(periods=-1),
                       pser.pct_change(periods=-1), almost=True)
        self.assert_eq(kser.pct_change(periods=-10...(truncated 213 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 639, 664:


        index = pd.MultiIndex.from_arrays([
            ['a', 'a', 'b', 'b'], ['c', 'd', 'e', 'f']], names=('first', 'se...(truncated 151 chars)

ℹ️ series.py: Copy paste fragment on line 1248 shared with ../frame.py:


    def to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True,
                 na_rep='NaN',...(truncated 256 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3105, 3213:

        results = sdf.select([scol] + index_scols).take(1)
        if len(results) == 0:
           ...(truncated 409 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 4124, 4346:

        sdf = self._internal.sdf \
            .select(cols) \
            .where(reduce(lambda x, y: x & y, rows))

        if len(self._inter...(truncated 255 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

💡 Insights

  • Co-change Alert: You modified series.py. Often frame.py (databricks/koalas) is modified at the same time.

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@HyukjinKwon HyukjinKwon merged commit 7844193 into databricks:master Dec 6, 2019
@itholic itholic deleted the fix_index_name branch December 10, 2019 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants