Skip to content

Conversation

@Yikun
Copy link
Member

@Yikun Yikun commented Sep 8, 2022

What changes were proposed in this pull request?

Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby

Why are the changes needed?

Improve PS api coverage

>>> s = ps.Series([4, 3, 5, 2, 6])
>>> s.rolling(2).quantile(0.5)
0    NaN
1    3.0
2    3.0
3    2.0
4    2.0
dtype: float64

>>> s = ps.Series([2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5])
>>> s.groupby(s).rolling(3).quantile(0.5).sort_index()
2  0     NaN
   1     NaN
3  2     NaN
   3     NaN
   4     3.0
4  5     NaN
   6     NaN
   7     4.0
   8     4.0
5  9     NaN
   10    NaN
dtype: float64

>>> s = ps.Series([1, 2, 3, 4])
>>> s.expanding(2).quantile(0.5)
0    NaN
1    1.0
2    2.0
3    2.0
dtype: float64

>>> s = ps.Series([2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5])
>>> s.groupby(s).expanding(3).quantile(0.5).sort_index()
2  0     NaN
   1     NaN
3  2     NaN
   3     NaN
   4     3.0
4  5     NaN
   6     NaN
   7     4.0
   8     4.0
5  9     NaN
   10    NaN
dtype: float64

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed

@Yikun
Copy link
Member Author

Yikun commented Sep 8, 2022

Consider test refactor change is huge, I just spilt it into a separate patch: #37835

@Yikun Yikun changed the title [WIP][SPARK-40339][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby [SPARK-40339][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby Sep 9, 2022
@zhengruifeng zhengruifeng changed the title [SPARK-40339][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby [SPARK-40339][SPARK-40345][SPARK-40345][SPARK-40348][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby Sep 13, 2022
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Merged #37835)

@zhengruifeng zhengruifeng changed the title [SPARK-40339][SPARK-40345][SPARK-40345][SPARK-40348][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby [SPARK-40339][SPARK-40342][SPARK-40345][SPARK-40348][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby Sep 13, 2022
@Yikun Yikun marked this pull request as ready for review September 13, 2022 09:30
@HyukjinKwon
Copy link
Member

cc @zhengruifeng @xinrong-meng @itholic FYI

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments below also applied to other functions.

Looks good otherwise.

-----
`quantile` in pandas-on-Spark are using distributed percentile approximation
algorithm unlike pandas, the result might different with pandas, also `interpolation`
parameters are not supported yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "parameters are not supported yet" -> "parameter is not supported yet" ?

Also can we comment it as "TODO" above function definition ?

e.g.

    # TODO: support `interpolation` parameter.
    def quantile(self, quantile: float, accuracy: int = 10000) -> FrameLike:
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter is not supported yet, will address!

Frankly, the interpolation almost impossible to support, because we are using distributed percentile approximation algorithm unlike pandas. So we might don't want to add the note.

@HyukjinKwon
Copy link
Member

Merged to master

@zhengruifeng
Copy link
Contributor

@Yikun I think we should also add quantile to window.rst

@Yikun
Copy link
Member Author

Yikun commented Sep 15, 2022

@HyukjinKwon @zhengruifeng Thanks

I think we should also add quantile to window.rst

Hmm, forgot again, I also noticed that ExpandingGroupby and RollingGroupby functions hasn't doc, so let me create a separate PR to cover doc:

  • all functions for ExpandingGroupby
  • all functions for RollingGroupby
  • quantile for Expanding
  • quantile for Rolling

@HyukjinKwon
Copy link
Member

Opps, just made a followup at #37890.

Actually ExpandingGroupby and RollingGroupby aren't documented in pandas side too. I don't know their intention thought. Let's make a quick followup only for Rolling.quantile and Expanding.quantile for now

@Yikun
Copy link
Member Author

Yikun commented Sep 15, 2022

@HyukjinKwon OK, thanks!

@zhengruifeng
Copy link
Contributor

Actually ExpandingGroupby and RollingGroupby aren't documented in pandas side too.

that's true, when I check the missing function in https://issues.apache.org/jira/browse/SPARK-40327, it's quite confusing that there is no document and I need to manually check whether a function exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants