Add Sliding MDE analysis (solved issue #229) #246

ludovicolc · 2025-10-10T11:09:42Z

Add mde_sliding_time_line method in the NormalPowerAnalysis class, enabling the calculation of MDE over varying experiment lengths.

Key features:

Computes MDE across sliding time windows, allowing measurement of treatment impact on metrics that vary over time.
Supports custom aggregation functions (agg_func) for grouped metrics (e.g., sum, mean, median).
Allows element-wise post-processing (post_process_func) on the aggregated metric, useful for transforming counts into binary flags or applying other transformations.

This enhancement provides flexibility to analyze temporal patterns in metric response and better capture dynamics of retention, conversion, or other evolving metrics.

Issue #229

david26694

Very good stuff, asked a couple questions and changes. The things left would be:

Add some unit tests
Install and run pre-commit in this file.
Bump version.
(optional) Add a small notebook in docs showing this functionality

david26694 · 2025-10-10T13:51:22Z

cluster_experiments/power_analysis.py

                )
        return results

+    def mde_sliding_time_line(


wdyt about mde_rolling_time_line? or rolling_mde_time_line?

changed to mde_rolling_time_line

david26694 · 2025-10-10T13:54:25Z

cluster_experiments/power_analysis.py

+
+            print(results)
+        """
+        used_time_col = self.time_col or time_col


Why do we have this logic here and not in mde_time_line? I would add it in both or nowhere. In case we add it in both, would put in a separate method

I refactored the logic for handling the time column into an helper _get_time_col().
now, methods that need a time column (like run_average_standard_error or mde_rolling_time_line) call _get_time_col() to validate that a column was provided during class initialization

david26694 · 2025-10-10T13:56:19Z

cluster_experiments/power_analysis.py

+                "or pass `time_col` to this method."
+            )
+
+        if agg_func is None:


can't come up with a more elegant way, but it's a bit confusing that we allow it as optional above and here we raise when it's None. Any thoughts?

This list:

'sum', 'mean', 'median', 'min', 'max', 'count', 'std', 'var', 'nunique', 'first', 'last'."

I would define it as a constant (or a class attribute) and use it here and the type hint.

I refactored it so that the valid aggregation functions are now defined as a class constant VALID_AGG_FUNCS.
agg_func is now a required argument, and the class constant is now used in validation.
as for the type hint, I didn't use the class constant because in lower python versions we cannot unpack a list inside Literal

david26694 · 2025-10-10T13:58:34Z

cluster_experiments/power_analysis.py

+            )[self.target_col].agg(agg_func)
+
+            if post_process_func is not None:
+                df_grouped[self.target_col] = df_grouped[self.target_col].apply(post_process_func)


apply is very slow, but not sure if assign works better in here. Any thought? if not, we keep it like this

yes! apply can be slower on very large datasets but the advantage of using apply here is that it makes it intuitive and flexible for the user to pass any callable function, without requiring it to be vectorized.
as for assign, I don't think it would improve performance as for element-wise operations it still relies on apply or equivalent under the hood

ludovicolc · 2025-10-11T17:28:20Z

Add some unit tests--> done
Install and run pre-commit in this file--> done
Bump version--> done
(optional) Add a small notebook in docs showing this functionality--> not sure if we want to create a new notebook or if we want to add this functionality to an existing one?!

codecov-commenter · 2025-10-11T17:31:28Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.46%. Comparing base (bbc2092) to head (46801ba).

Files with missing lines	Patch %	Lines
cluster_experiments/power_analysis.py	89.28%	3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   96.57%   96.46%   -0.12%     
==========================================
  Files          17       17              
  Lines        1783     1809      +26     
==========================================
+ Hits         1722     1745      +23     
- Misses         61       64       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

david26694

Adding a few minor comments. About

(optional) Add a small notebook in docs showing this functionality--> not sure if we want to create a new notebook or if we want to add this functionality to an existing one?!

whatever you think is best, both work for me! I dunno if we have a notebook with orders per user or something like that

david26694 · 2025-10-12T15:54:51Z

cluster_experiments/power_analysis.py

+    def mde_rolling_time_line(
+        self,
+        df: pd.DataFrame,
+        agg_func: Literal[


I know we have to put it as second argument if we don't want to make it optional, but still I think making it optional is better than breaking the order of arguments from mde_time_line where pre_exepriment_df is the second argument. Do you have an opinion on moving this to an argument below or keeping it here?

moved it and added * before it to make it a “keyword-only argument”. this way is still mandatory but it's aligned with the order of arguments from mde_time_line

david26694 · 2025-10-12T15:55:34Z

cluster_experiments/power_analysis.py

+                            and return a scalar.
+
+        Example with post_process_func:
+            def flag_positive(x):


Wdyt about making a full reproducible example? this way we can test it in the test_docs file

david26694 · 2025-10-12T15:55:58Z

cluster_experiments/power_analysis.py

+        experiment_start = df[time_col].min()
+
+        for n_days in experiment_length:
+            df_time = df[df[time_col] <= experiment_start + pd.Timedelta(days=n_days)]


should we rename to df_time_filter or something like this? more explicit I think

add sliding MDE analysis

e52c755

david26694 requested changes Oct 10, 2025

View reviewed changes

ludovicolc added 9 commits October 11, 2025 15:19

change method name to mde_rolling_time_line

a3761f2

add _get_time_col method

e7a90d6

change agg_func logic

e858bc8

change Literal list in agg_func

ea6a8fa

add relative_mde to mde_rolling_time_line output

de27913

update relative_mde

6e9e8f5

add test for mde_rolling_time_line

6bc8af5

run pre-commit

cee441e

bump version to 0.28.0

46801ba

david26694 reviewed Oct 12, 2025

View reviewed changes

ludovicolc added 3 commits October 14, 2025 21:02

changed agg func position and df_time to df_time_filter

c2665ff

update docstring

7de67ee

update docstring

dcf5376

Add Sliding MDE analysis (solved issue #229) #246

Are you sure you want to change the base?

Add Sliding MDE analysis (solved issue #229) #246

Conversation

ludovicolc commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david26694 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ludovicolc commented Oct 11, 2025

Uh oh!

codecov-commenter commented Oct 11, 2025

Codecov Report

Uh oh!

david26694 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ludovicolc commented Oct 10, 2025 •

edited

Loading