Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pct_change() for Series #1071

Merged
merged 6 commits into from
Dec 3, 2019

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Nov 24, 2019

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html#pandas.Series.pct_change

>>> kser = ks.Series([90, 91, 85])
>>> kser
0    90
1    91
2    85
Name: 0, dtype: int64

>>> kser.pct_change()
0         NaN
1    0.011111
2   -0.065934
Name: 0, dtype: float64

>>> kser.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
Name: 0, dtype: float64
"""

@itholic
Copy link
Contributor Author

itholic commented Nov 24, 2019

i'm going to integrated the logic with DataFrame.pct_change() after this and #1051 will be merged.

@codecov-io
Copy link

codecov-io commented Nov 24, 2019

Codecov Report

Merging #1071 into master will increase coverage by <.01%.
The diff coverage is 93.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1071      +/-   ##
==========================================
+ Coverage    95.2%   95.21%   +<.01%     
==========================================
  Files          34       34              
  Lines        6863     6893      +30     
==========================================
+ Hits         6534     6563      +29     
- Misses        329      330       +1
Impacted Files Coverage Δ
databricks/koalas/missing/series.py 100% <ø> (ø) ⬆️
databricks/koalas/series.py 96.44% <93.93%> (-0.1%) ⬇️
databricks/koalas/config.py 98.98% <0%> (-1.02%) ⬇️
databricks/koalas/missing/frame.py 100% <0%> (ø) ⬆️
databricks/koalas/missing/indexes.py 100% <0%> (ø) ⬆️
databricks/koalas/indexes.py 96.2% <0%> (+0.14%) ⬆️
databricks/koalas/generic.py 95.92% <0%> (+0.19%) ⬆️
databricks/koalas/base.py 95.89% <0%> (+0.65%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bacd903...3e82a04. Read the comment docs.

databricks/koalas/series.py Outdated Show resolved Hide resolved
databricks/koalas/series.py Outdated Show resolved Hide resolved
databricks/koalas/series.py Outdated Show resolved Hide resolved
databricks/koalas/series.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we might just need:

-        sdf = self._internal._sdf
-        window = Window.orderBy(F.monotonically_increasing_id()).rowsBetween(-periods, -periods)
+        scol = self._internal.scol

-        for column_name in self._internal.data_columns:
-            prev_row = F.lag(scol_for(sdf, column_name), periods).over(window)
-            sdf = sdf.withColumn(column_name, (scol_for(sdf, column_name) - prev_row) / prev_row)
-
-        internal = self._internal.copy(
-            sdf=sdf,
-            scol=scol_for(sdf, self._internal.data_columns[0]))
+        window = Window.orderBy(F.monotonically_increasing_id()).rowsBetween(-periods, -periods)
+        prev_row = F.lag(scol, periods).over(window)

-        return _col(DataFrame(internal))
+        return self._with_new_scol((scol - prev_row) / prev_row)

self.assert_eq(repr(kser.pct_change(periods=-100000000)),
repr(pser.pct_change(periods=-100000000)))
self.assert_eq(repr(kser.pct_change(periods=100000000)),
repr(pser.pct_change(periods=100000000)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these using repr()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they raise AssertionError even they looks like same result.

스크린샷 2019-12-03 오전 10 10 44

but i just changed them to use almost=True rather than repr

@itholic
Copy link
Contributor Author

itholic commented Dec 3, 2019

#1071 (review)

@ueshin It looks really cool. thanks for spending your time for this. 👍

@softagram-bot
Copy link

Softagram Impact Report for pull/1071 (head commit: 3e82a04)

⚠️ Copy paste found

ℹ️ series.py: Copy paste fragment on line 121 shared with ../frame.py:

    get_values = unsupported_function('get_values', deprecated=True)
    to_dense = unsupported_function('to_dense', deprecated=True)
 ...(truncated 199 chars)

ℹ️ test_series.py: Copy paste fragment on line 553 shared with ../test_series_plot.py:

            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 50],
        }, index=[0, 1, 3, 5, 6, 8, 9, 9, 9, 10, 10])

ℹ️ test_series.py: Copy paste fragment on line 559 shared with ../test_frame_plot.py, ../test_series_plot.py:

        bytes_data = BytesIO()
        ax.figure.savefig(bytes_data, format='png')
        bytes_data.seek(0)
        b64_data = base64.b64encode(bytes_data.read())
       ...(truncated 45 chars)

ℹ️ test_series.py: Copy paste fragment on line 50 shared with ../test_series_conversion.py:

    def pser(self):
        return pd.Series([1, 2, 3, 4, 5, 6, 7], name='x')

    @property
    def kser(self):
        return ks.from_pandas(self.pser)

    def test_series(self):

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 312, 329:

                     pd.Series([True, False], name='x'),
                     pd.Series([0, 1], name='x'),
                     pd.Series([1, 2,...(truncated 330 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 744, 903:

        midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2]...(truncated 280 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 745, 904, 925:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 256 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 716, 922:


        # For MultiIndex
        midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
                              ['speed', 'weight', 'length']],
                             [[...(truncated 167 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 719, 745, 904:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 117 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 827, 844:


        pser1 = pd.Series([-1, -2, -3, -4, -5], name=0)
        pser2 = pd.Series([-100, -200, -300, -400, -500], name=0)
        k...(truncated 123 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 180, 190:

        pdf = pd.DataFrame({
            'left':  [True, False, True, False, np.nan, np.nan, True, False, np.nan],
            'right': [True, False, False, True, True, False, n...(truncated 119 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 747, 788, 906, 927:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
             ...(truncated 137 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 915, 935:


        self.assert_eq(kser.pct_change(periods=-1),
                       pser.pct_change(periods=-1), almost=True)
        self.assert_eq(kser.pct_change(periods=-10...(truncated 213 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 649, 668:


        index = pd.MultiIndex.from_arrays([
            ['a', 'a', 'b', 'b'], ['c', 'd', 'e', 'f']], names=('first', 'se...(truncated 151 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 820, 837:

        pser1 = pd.Series([0, 1, 2, 3, 4], name=0)
        pser2 = pd.Series([100, 200, 300, 400, 500], name=0)
        kser1 = ks.from_pandas(pser1)
        kser2 = ks.from_...(truncated 69 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 721, 788:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 522, 532:


        pser = pd.Series([1, 2, 3], name='0',
                         index=pd.MultiIndex.from_tuples([('A', 'X'), ('A', 'Y...(truncated 128 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 102, 107:

        self.assertTrue(ks.from_pandas(a).toPandas().isnull().all())
        self.assertRaises(ValueError, lambda: ks.from_pandas(b))

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 282, 290:

        sample_lst = [1, 2, 3, 4, np.nan, 6]
        pser = pd.Series(sample_lst, name='x')
        kser = ks.Series(sample_lst, name='x')
        self.assert_eq(kser.nsm...(truncated 33 chars)

ℹ️ series.py: Copy paste fragment on line 1250 shared with ../frame.py:


    def to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True,
                 na_rep='NaN',...(truncated 256 chars)

ℹ️ series.py: Copy paste fragment on line 2017 shared with ../frame.py:

                   level: Optional[Union[int, List[int]]] = None, ascending: bool = True,
                   inplace: bool = False, kind: str = None, na_positio...(truncated 56 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3187, 3295:

        results = sdf.select([scol] + index_scols).take(1)
        if len(results) == 0:
            raise ValueError(\"...(truncated 390 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 4241, 4448:

        sdf = self._internal.sdf \
            .select(cols) \
            .where(reduce(lambda x, y: x & y, rows))

        if len(self._inter...(truncated 255 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3462, 4461:

            internal = _InternalFrame(sdf=sdf, index_map=[(SPARK_INDEX_NAME_FORMAT(0), None)])
            return ...(truncated 148 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

💡 Insights

  • Co-change Alert: You modified series.py. Often frame.py (databricks/koalas) is modified at the same time.

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@HyukjinKwon HyukjinKwon merged commit a47f7cf into databricks:master Dec 3, 2019
@itholic itholic deleted the s_pct_change branch December 10, 2019 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants