-
Notifications
You must be signed in to change notification settings - Fork 296
feat: Add Expr.skew
#4346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Expr.skew
#4346
Conversation
} | ||
|
||
fn grouped_skew(&self, groups: &GroupIndices) -> Self::Output { | ||
let grouped_skew_iter = stats::grouped_stats(self, groups)?.map(|(stats, group)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially dumb question, but why do we need two variants. Seems like the grouped_
variant can always be derived from the regular.
let (m3, m2) = values.fold((0., 0.), |(m3_acc, m2_acc), v| { | ||
( | ||
m3_acc + (v - mean).powi(3), | ||
(v - mean).mul_add(v - mean, m2_acc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if there's a better way to do this. Couldn't figure out a good way to reuse / copy the iterator so just calculated both aggregations in the same iterator
); | ||
|
||
let result = numerator.div(denom).div(n).alias(output_name); | ||
final_exprs.push(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it like this using the DuckDB algorithm instead of just accumulation and calling skew because I couldn't really figure out how to implement a list_skew
on ListArrays. Kept running into issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you happen to think that would be easier or have guidance for the next time, just let me know
@colin-ho Let me know if someone else would be better to have review. Just tagging you for review initially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Definitely not something we need to do here, but it might be worth thinking about how we can further simplify the agg exprs like we've been doing with scalar exprs.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4346 +/- ##
==========================================
+ Coverage 78.21% 78.56% +0.34%
==========================================
Files 814 820 +6
Lines 110992 109850 -1142
==========================================
- Hits 86811 86299 -512
+ Misses 24181 23551 -630
🚀 New features to boost your workflow:
|
Agree, yeah especially making it a bit more flexible for future additions. |
Changes Made
Adds
Expr.skew
. Implements it like Polars and Spark (Pandas and DuckDB implement a slightly different algorithm with normalization, Polars has a flag to choose between the options).Series.skew
implements it normally andpopulate_aggregation_stages
has a specialized version.Related Issues
Closes #4332.
Checklist
docs/mkdocs.yml
navigation