feat: Add `Expr.skew` #4346

srilman · 2025-05-14T01:36:37Z

Changes Made

Adds Expr.skew. Implements it like Polars and Spark (Pandas and DuckDB implement a slightly different algorithm with normalization, Polars has a flag to choose between the options). Series.skew implements it normally and populate_aggregation_stages has a specialized version.

Related Issues

Closes #4332.

Checklist

Documented in API Docs (if applicable)
Documented in User Guide (if applicable)
If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

srilman · 2025-05-14T01:37:53Z

src/daft-core/src/array/ops/skew.rs

+    }
+
+    fn grouped_skew(&self, groups: &GroupIndices) -> Self::Output {
+        let grouped_skew_iter = stats::grouped_stats(self, groups)?.map(|(stats, group)| {


Potentially dumb question, but why do we need two variants. Seems like the grouped_ variant can always be derived from the regular.

srilman · 2025-05-14T01:38:46Z

src/daft-core/src/utils/stats.rs

+        let (m3, m2) = values.fold((0., 0.), |(m3_acc, m2_acc), v| {
+            (
+                m3_acc + (v - mean).powi(3),
+                (v - mean).mul_add(v - mean, m2_acc),


Let me know if there's a better way to do this. Couldn't figure out a good way to reuse / copy the iterator so just calculated both aggregations in the same iterator

srilman · 2025-05-14T01:39:46Z

src/daft-physical-plan/src/physical_planner/translate.rs

+                );
+
+                let result = numerator.div(denom).div(n).alias(output_name);
+                final_exprs.push(result);


I did it like this using the DuckDB algorithm instead of just accumulation and calling skew because I couldn't really figure out how to implement a list_skew on ListArrays. Kept running into issues

If you happen to think that would be easier or have guidance for the next time, just let me know

srilman · 2025-05-14T01:40:56Z

@colin-ho Let me know if someone else would be better to have review. Just tagging you for review initially

universalmind303

lgtm!

Definitely not something we need to do here, but it might be worth thinking about how we can further simplify the agg exprs like we've been doing with scalar exprs.

codecov · 2025-05-14T20:25:31Z

Codecov Report

Attention: Patch coverage is 90.28571% with 17 lines in your changes missing coverage. Please review.

Project coverage is 78.56%. Comparing base (8c10e5b) to head (ac9aa3f).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-core/src/datatypes/agg_ops.rs	33.33%	6 Missing ⚠️
src/daft-core/src/series/ops/agg.rs	71.42%	4 Missing ⚠️
daft/dataframe/dataframe.py	25.00%	3 Missing ⚠️
src/daft-logical-plan/src/ops/project.rs	0.00%	3 Missing ⚠️
src/daft-sql/src/modules/aggs.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4346      +/-   ##
==========================================
+ Coverage   78.21%   78.56%   +0.34%     
==========================================
  Files         814      820       +6     
  Lines      110992   109850    -1142     
==========================================
- Hits        86811    86299     -512     
+ Misses      24181    23551     -630

Files with missing lines	Coverage Δ
daft/expressions/expressions.py	`95.91% <100.00%> (+0.01%)`	⬆️
src/daft-core/src/array/ops/skew.rs	`100.00% <100.00%> (ø)`
src/daft-core/src/datatypes/mod.rs	`40.00% <ø> (ø)`
src/daft-core/src/utils/stats.rs	`98.43% <100.00%> (+0.43%)`	⬆️
src/daft-dsl/src/expr/mod.rs	`79.63% <100.00%> (+0.21%)`	⬆️
src/daft-dsl/src/python.rs	`90.38% <100.00%> (-1.49%)`	⬇️
...ft-physical-plan/src/physical_planner/translate.rs	`93.93% <100.00%> (+0.39%)`	⬆️
src/daft-recordbatch/src/lib.rs	`85.81% <100.00%> (+0.05%)`	⬆️
src/daft-sql/src/modules/aggs.rs	`68.04% <0.00%> (-0.71%)`	⬇️
daft/dataframe/dataframe.py	`85.67% <25.00%> (-0.23%)`	⬇️
... and 3 more

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

srilman · 2025-05-14T20:31:46Z

Definitely not something we need to do here, but it might be worth thinking about how we can further simplify the agg exprs like we've been doing with scalar exprs.

Agree, yeah especially making it a bit more flexible for future additions.

Working impl

ac9aa3f

github-actions bot added the feat label May 14, 2025

srilman commented May 14, 2025

View reviewed changes

srilman requested a review from colin-ho May 14, 2025 01:40

srilman requested a review from universalmind303 May 14, 2025 01:43

universalmind303 approved these changes May 14, 2025

View reviewed changes

srilman merged commit 34a7393 into main May 14, 2025
98 of 100 checks passed

srilman deleted the slade/skew branch May 14, 2025 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add `Expr.skew` #4346

feat: Add `Expr.skew` #4346

Uh oh!

srilman commented May 14, 2025

Uh oh!

srilman May 14, 2025

Uh oh!

srilman May 14, 2025

Uh oh!

srilman May 14, 2025

Uh oh!

srilman May 14, 2025

Uh oh!

srilman commented May 14, 2025

Uh oh!

universalmind303 left a comment

Uh oh!

codecov bot commented May 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

srilman commented May 14, 2025

Uh oh!

Uh oh!

feat: Add Expr.skew #4346

feat: Add Expr.skew #4346

Uh oh!

Conversation

srilman commented May 14, 2025

Changes Made

Related Issues

Checklist

Uh oh!

srilman May 14, 2025

Choose a reason for hiding this comment

Uh oh!

srilman May 14, 2025

Choose a reason for hiding this comment

Uh oh!

srilman May 14, 2025

Choose a reason for hiding this comment

Uh oh!

srilman May 14, 2025

Choose a reason for hiding this comment

Uh oh!

srilman commented May 14, 2025

Uh oh!

universalmind303 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

srilman commented May 14, 2025

Uh oh!

Uh oh!

feat: Add `Expr.skew` #4346

feat: Add `Expr.skew` #4346

codecov bot commented May 14, 2025 •

edited

Loading