Skip to content

Conversation

ludovicolc
Copy link
Collaborator

@ludovicolc ludovicolc commented Oct 10, 2025

Add mde_sliding_time_line method in the NormalPowerAnalysis class, enabling the calculation of MDE over varying experiment lengths.

Key features:

  • Computes MDE across sliding time windows, allowing measurement of treatment impact on metrics that vary over time.
  • Supports custom aggregation functions (agg_func) for grouped metrics (e.g., sum, mean, median).
  • Allows element-wise post-processing (post_process_func) on the aggregated metric, useful for transforming counts into binary flags or applying other transformations.

This enhancement provides flexibility to analyze temporal patterns in metric response and better capture dynamics of retention, conversion, or other evolving metrics.

Issue #229

Copy link
Owner

@david26694 david26694 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good stuff, asked a couple questions and changes. The things left would be:

  1. Add some unit tests
  2. Install and run pre-commit in this file.
  3. Bump version.
  4. (optional) Add a small notebook in docs showing this functionality

)
return results

def mde_sliding_time_line(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about mde_rolling_time_line? or rolling_mde_time_line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to mde_rolling_time_line

print(results)
"""
used_time_col = self.time_col or time_col
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this logic here and not in mde_time_line? I would add it in both or nowhere. In case we add it in both, would put in a separate method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the logic for handling the time column into an helper _get_time_col().
now, methods that need a time column (like run_average_standard_error or mde_rolling_time_line) call _get_time_col() to validate that a column was provided during class initialization

"or pass `time_col` to this method."
)

if agg_func is None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't come up with a more elegant way, but it's a bit confusing that we allow it as optional above and here we raise when it's None. Any thoughts?

This list:

'sum', 'mean', 'median', 'min', 'max', 'count', 'std', 'var', 'nunique', 'first', 'last'."

I would define it as a constant (or a class attribute) and use it here and the type hint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it so that the valid aggregation functions are now defined as a class constant VALID_AGG_FUNCS.
agg_func is now a required argument, and the class constant is now used in validation.
as for the type hint, I didn't use the class constant because in lower python versions we cannot unpack a list inside Literal

)[self.target_col].agg(agg_func)

if post_process_func is not None:
df_grouped[self.target_col] = df_grouped[self.target_col].apply(post_process_func)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply is very slow, but not sure if assign works better in here. Any thought? if not, we keep it like this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! apply can be slower on very large datasets but the advantage of using apply here is that it makes it intuitive and flexible for the user to pass any callable function, without requiring it to be vectorized.
as for assign, I don't think it would improve performance as for element-wise operations it still relies on apply or equivalent under the hood

@ludovicolc
Copy link
Collaborator Author

  1. Add some unit tests--> done
  2. Install and run pre-commit in this file--> done
  3. Bump version--> done
  4. (optional) Add a small notebook in docs showing this functionality--> not sure if we want to create a new notebook or if we want to add this functionality to an existing one?!

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.46%. Comparing base (bbc2092) to head (46801ba).

Files with missing lines Patch % Lines
cluster_experiments/power_analysis.py 89.28% 3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   96.57%   96.46%   -0.12%     
==========================================
  Files          17       17              
  Lines        1783     1809      +26     
==========================================
+ Hits         1722     1745      +23     
- Misses         61       64       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Owner

@david26694 david26694 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few minor comments. About

(optional) Add a small notebook in docs showing this functionality--> not sure if we want to create a new notebook or if we want to add this functionality to an existing one?!

whatever you think is best, both work for me! I dunno if we have a notebook with orders per user or something like that

def mde_rolling_time_line(
self,
df: pd.DataFrame,
agg_func: Literal[
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have to put it as second argument if we don't want to make it optional, but still I think making it optional is better than breaking the order of arguments from mde_time_line where pre_exepriment_df is the second argument. Do you have an opinion on moving this to an argument below or keeping it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it and added * before it to make it a “keyword-only argument”. this way is still mandatory but it's aligned with the order of arguments from mde_time_line

and return a scalar.
Example with post_process_func:
def flag_positive(x):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt about making a full reproducible example? this way we can test it in the test_docs file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

experiment_start = df[time_col].min()

for n_days in experiment_length:
df_time = df[df[time_col] <= experiment_start + pd.Timedelta(days=n_days)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename to df_time_filter or something like this? more explicit I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants