Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Approximate quantile aggregation (pulled into main) #2179

Merged
merged 39 commits into from
May 2, 2024

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Apr 24, 2024

Puts the finishing touches on #2076

@github-actions github-actions bot added the enhancement New feature or request label Apr 24, 2024
Copy link

codecov bot commented Apr 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@3e5da66). Click here to learn what that means.
Report is 1 commits behind head on main.

❗ Current head 77e5fd3 differs from pull request most recent head 7980e58. Consider uploading reports for the commit 7980e58 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2179   +/-   ##
=======================================
  Coverage        ?   85.33%           
=======================================
  Files           ?       69           
  Lines           ?     7458           
  Branches        ?        0           
=======================================
  Hits            ?     6364           
  Misses          ?     1094           
  Partials        ?        0           
Files Coverage Δ
daft/expressions/expressions.py 92.99% <100.00%> (ø)

Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lookin really good! A few high level things:

  • Add the expression to the docs
  • Do we want to expose top level approx_percentile expressions for this? (on df and grouped df) so users can do df.approx_percentiles("values") or df.group_by('group').approx_percentiles("values")

@@ -434,6 +434,21 @@ def sum(self) -> Expression:
expr = self._expr.sum()
return Expression._from_pyexpr(expr)

def approx_percentiles(self, percentiles: builtins.float | builtins.list[builtins.float]) -> Expression:
"""Calculates the approximate percentile(s) for a float column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to enable support for temporal columns in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually expose this functionality to support all ordered columns (numeric, temporal, strings...)

Then we can leverage this for our sorts. Currently our sorts perform its own bespoke solution for sampling for boundaries before repartitioniong, but we can leverage approx_percentiles to generate those boundaries for us!

src/daft-core/src/series/ops/agg.rs Outdated Show resolved Hide resolved
src/daft-dsl/src/expr.rs Outdated Show resolved Hide resolved
src/daft-dsl/src/expr.rs Show resolved Hide resolved
src/daft-dsl/src/expr.rs Outdated Show resolved Hide resolved
daft/expressions/expressions.py Show resolved Hide resolved
src/daft-dsl/src/python.rs Show resolved Hide resolved
daft/expressions/expressions.py Outdated Show resolved Hide resolved
daft/expressions/expressions.py Show resolved Hide resolved
@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 1, 2024
@jaychia jaychia requested a review from colin-ho May 1, 2024 22:42
Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jaychia jaychia enabled auto-merge (squash) May 2, 2024 17:17
@jaychia
Copy link
Contributor Author

jaychia commented May 2, 2024

FYI @maxime-petitjean !

@jaychia jaychia merged commit 99a0ac0 into main May 2, 2024
27 checks passed
@jaychia jaychia deleted the jay/approx-quantile-aggregation branch May 2, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants