Implement pct_change() for Series #1071

itholic · 2019-11-24T15:10:36Z

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html#pandas.Series.pct_change

>>> kser = ks.Series([90, 91, 85])
>>> kser
0    90
1    91
2    85
Name: 0, dtype: int64

>>> kser.pct_change()
0         NaN
1    0.011111
2   -0.065934
Name: 0, dtype: float64

>>> kser.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
Name: 0, dtype: float64
"""

itholic · 2019-11-24T15:10:46Z

i'm going to integrated the logic with DataFrame.pct_change() after this and #1051 will be merged.

codecov-io · 2019-11-24T15:46:16Z

Codecov Report

Merging #1071 into master will increase coverage by <.01%.
The diff coverage is 93.93%.

@@            Coverage Diff             @@
##           master    #1071      +/-   ##
==========================================
+ Coverage    95.2%   95.21%   +<.01%     
==========================================
  Files          34       34              
  Lines        6863     6893      +30     
==========================================
+ Hits         6534     6563      +29     
- Misses        329      330       +1

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/series.py	`96.44% <93.93%> (-0.1%)`	⬇️
databricks/koalas/config.py	`98.98% <0%> (-1.02%)`	⬇️
databricks/koalas/missing/frame.py	`100% <0%> (ø)`	⬆️
databricks/koalas/missing/indexes.py	`100% <0%> (ø)`	⬆️
databricks/koalas/indexes.py	`96.2% <0%> (+0.14%)`	⬆️
databricks/koalas/generic.py	`95.92% <0%> (+0.19%)`	⬆️
databricks/koalas/base.py	`95.89% <0%> (+0.65%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bacd903...3e82a04. Read the comment docs.

databricks/koalas/series.py

ueshin

On second thought, we might just need:

-        sdf = self._internal._sdf
-        window = Window.orderBy(F.monotonically_increasing_id()).rowsBetween(-periods, -periods)
+        scol = self._internal.scol

-        for column_name in self._internal.data_columns:
-            prev_row = F.lag(scol_for(sdf, column_name), periods).over(window)
-            sdf = sdf.withColumn(column_name, (scol_for(sdf, column_name) - prev_row) / prev_row)
-
-        internal = self._internal.copy(
-            sdf=sdf,
-            scol=scol_for(sdf, self._internal.data_columns[0]))
+        window = Window.orderBy(F.monotonically_increasing_id()).rowsBetween(-periods, -periods)
+        prev_row = F.lag(scol, periods).over(window)

-        return _col(DataFrame(internal))
+        return self._with_new_scol((scol - prev_row) / prev_row)

ueshin · 2019-12-02T23:25:23Z

databricks/koalas/tests/test_series.py

+        self.assert_eq(repr(kser.pct_change(periods=-100000000)),
+                       repr(pser.pct_change(periods=-100000000)))
+        self.assert_eq(repr(kser.pct_change(periods=100000000)),
+                       repr(pser.pct_change(periods=100000000)))


Why are these using repr()?

Because they raise AssertionError even they looks like same result.

but i just changed them to use almost=True rather than repr

itholic · 2019-12-03T01:18:29Z

#1071 (review)

@ueshin It looks really cool. thanks for spending your time for this. 👍

softagram-bot · 2019-12-03T01:19:58Z

Softagram Impact Report for pull/1071 (head commit: `3e82a04`)

⚠️ Copy paste found

ℹ️ series.py: Copy paste fragment on line 121 shared with ../frame.py:

    get_values = unsupported_function('get_values', deprecated=True)
    to_dense = unsupported_function('to_dense', deprecated=True)
 ...(truncated 199 chars)

ℹ️ test_series.py: Copy paste fragment on line 553 shared with ../test_series_plot.py:

            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 50],
        }, index=[0, 1, 3, 5, 6, 8, 9, 9, 9, 10, 10])

ℹ️ test_series.py: Copy paste fragment on line 559 shared with ../test_frame_plot.py, ../test_series_plot.py:

        bytes_data = BytesIO()
        ax.figure.savefig(bytes_data, format='png')
        bytes_data.seek(0)
        b64_data = base64.b64encode(bytes_data.read())
       ...(truncated 45 chars)

ℹ️ test_series.py: Copy paste fragment on line 50 shared with ../test_series_conversion.py:

    def pser(self):
        return pd.Series([1, 2, 3, 4, 5, 6, 7], name='x')

    @property
    def kser(self):
        return ks.from_pandas(self.pser)

    def test_series(self):

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 312, 329:

                     pd.Series([True, False], name='x'),
                     pd.Series([0, 1], name='x'),
                     pd.Series([1, 2,...(truncated 330 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 744, 903:

        midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2]...(truncated 280 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 745, 904, 925:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 256 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 716, 922:


        # For MultiIndex
        midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
                              ['speed', 'weight', 'length']],
                             [[...(truncated 167 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 719, 745, 904:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 117 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 827, 844:


        pser1 = pd.Series([-1, -2, -3, -4, -5], name=0)
        pser2 = pd.Series([-100, -200, -300, -400, -500], name=0)
        k...(truncated 123 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 180, 190:

        pdf = pd.DataFrame({
            'left':  [True, False, True, False, np.nan, np.nan, True, False, np.nan],
            'right': [True, False, False, True, True, False, n...(truncated 119 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 747, 788, 906, 927:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
             ...(truncated 137 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 915, 935:


        self.assert_eq(kser.pct_change(periods=-1),
                       pser.pct_change(periods=-1), almost=True)
        self.assert_eq(kser.pct_change(periods=-10...(truncated 213 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 649, 668:


        index = pd.MultiIndex.from_arrays([
            ['a', 'a', 'b', 'b'], ['c', 'd', 'e', 'f']], names=('first', 'se...(truncated 151 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 820, 837:

        pser1 = pd.Series([0, 1, 2, 3, 4], name=0)
        pser2 = pd.Series([100, 200, 300, 400, 500], name=0)
        kser1 = ks.from_pandas(pser1)
        kser2 = ks.from_...(truncated 69 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 721, 788:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 522, 532:


        pser = pd.Series([1, 2, 3], name='0',
                         index=pd.MultiIndex.from_tuples([('A', 'X'), ('A', 'Y...(truncated 128 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 102, 107:

        self.assertTrue(ks.from_pandas(a).toPandas().isnull().all())
        self.assertRaises(ValueError, lambda: ks.from_pandas(b))

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 282, 290:

        sample_lst = [1, 2, 3, 4, np.nan, 6]
        pser = pd.Series(sample_lst, name='x')
        kser = ks.Series(sample_lst, name='x')
        self.assert_eq(kser.nsm...(truncated 33 chars)

ℹ️ series.py: Copy paste fragment on line 1250 shared with ../frame.py:


    def to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True,
                 na_rep='NaN',...(truncated 256 chars)

ℹ️ series.py: Copy paste fragment on line 2017 shared with ../frame.py:

                   level: Optional[Union[int, List[int]]] = None, ascending: bool = True,
                   inplace: bool = False, kind: str = None, na_positio...(truncated 56 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3187, 3295:

        results = sdf.select([scol] + index_scols).take(1)
        if len(results) == 0:
            raise ValueError(\"...(truncated 390 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 4241, 4448:

        sdf = self._internal.sdf \
            .select(cols) \
            .where(reduce(lambda x, y: x & y, rows))

        if len(self._inter...(truncated 255 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3462, 4461:

            internal = _InternalFrame(sdf=sdf, index_map=[(SPARK_INDEX_NAME_FORMAT(0), None)])
            return ...(truncated 148 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

(Open in Softagram Desktop for full details)

💡 Insights

Co-change Alert: You modified series.py. Often frame.py (databricks/koalas) is modified at the same time.

📄 Full report

Permalink: Full report for pull/1071

Impact Report explained. Give feedback on this report to [email protected]

Implement pct_change() for Series

447233e

ueshin reviewed Nov 25, 2019

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

fix

153efd1

ueshin reviewed Nov 27, 2019

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

itholic added 2 commits November 28, 2019 21:48

fix

9c0ce09

add test cases

04e68c6

ueshin reviewed Dec 2, 2019

View reviewed changes

almost

05cbc92

fix logic

3e82a04

HyukjinKwon approved these changes Dec 3, 2019

View reviewed changes

HyukjinKwon merged commit a47f7cf into databricks:master Dec 3, 2019

itholic deleted the s_pct_change branch December 10, 2019 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pct_change() for Series #1071

Implement pct_change() for Series #1071

itholic commented Nov 24, 2019

itholic commented Nov 24, 2019

codecov-io commented Nov 24, 2019 •

edited

Loading

ueshin left a comment

ueshin Dec 2, 2019

itholic Dec 3, 2019

itholic commented Dec 3, 2019

softagram-bot commented Dec 3, 2019

Implement pct_change() for Series #1071

Implement pct_change() for Series #1071

Conversation

itholic commented Nov 24, 2019

itholic commented Nov 24, 2019

codecov-io commented Nov 24, 2019 • edited Loading

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

ueshin Dec 2, 2019

Choose a reason for hiding this comment

itholic Dec 3, 2019

Choose a reason for hiding this comment

itholic commented Dec 3, 2019

softagram-bot commented Dec 3, 2019

Softagram Impact Report for pull/1071 (head commit: 3e82a04)

⚠️ Copy paste found

⭐ Change Overview

💡 Insights

📄 Full report

codecov-io commented Nov 24, 2019 •

edited

Loading

Softagram Impact Report for pull/1071 (head commit: `3e82a04`)