Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for a feature. Please introduce the TrialStatus.PAUSED #862

Closed
alxfed opened this issue Mar 22, 2022 · 5 comments
Closed

Request for a feature. Please introduce the TrialStatus.PAUSED #862

alxfed opened this issue Mar 22, 2022 · 5 comments
Assignees
Labels
wishlist Long-term wishlist feature requests

Comments

@alxfed
Copy link

alxfed commented Mar 22, 2022

In many cases when the trial data are collected from an online log (or a sequence of events coming in real time) and metrics are calculated over the periods of time of a different duration (like Retention D1 and Conversion D2 with 2 days for the first and 3 days for the second necessary for calculation) it is advantageous to collect data related to the same daily cohorts only. This can be done by starting a trial for a day, then pausing it until the metric with the longest duration can be calculated, then, after the intermediate evaluation - restarting the same trial.
It can also be used for collecting small samples from a large stream of data periodically, let's say once a day for 5 minutes, then assessing the resulting data together, as a single Trial.

Subclassing works of course, but it is just logical to have it in the framework itself.

@danielcohenlive
Copy link

Hi @alxfed, thanks for the feedback! I think it sounds reasonable to add TrialStatus.PAUSED, but just to be clear this is only a means to the end of having a row of data per day (or some other time period) per arm, right? Depending on your infrastructure, pausing may not be necessary for this. It could be possible to accomplished by creating a custom metric.

Or if it's not about having multiple rows but just being able to query specific time windows, fetch_trial_data() can accept kwargs, and if those kwargs are passed to fetch_data() they will be plumbed down to fetch_trial_data().

Can you explain your use case a little more and why it is that pausing allows you collect data for a specific time window? This sounds like a field experiment where data is being collected based on some sort of user interactions or non-deterministic events? If so, I would think the individual results could be recorded with a timestamp and separated by time window that way.

@alxfed
Copy link
Author

alxfed commented Mar 22, 2022

The simplest way right now is to assign trial._properties={'state':"PAUSED"}, if you want my take on this subject. Saves and restores beautifully too
...but I was just saying that you equipped the preparatory stage of the experiment with very useful states CANDIDATE and STAGED (and multiple types of trial ending) but didn't do the similar job with the actual run in anticipation that somebody will be using your framework for real-time online experiments and will need this. That's all. I'm not seeking advice. Thank you.

@Balandat
Copy link
Contributor

This can be done by starting a trial for a day, then pausing it until the metric with the longest duration can be calculated, then, after the intermediate evaluation - restarting the same trial.

Is there a reason you want to restart the same trial rather than run another trial of the same arm? In Ax, the concept of a particular configuration to evaluate is linked to an arm, rather than a trial. In particular, the same arm can be evaluated in different trials. Doing this could simplify your setup. Do you think this would work for you?

@alxfed
Copy link
Author

alxfed commented Mar 23, 2022

Max, yes, there is a reason. My optimization config / Scalarized objective has these multiple metrics (with different durations and individual weights). I'm optimizing this aggregate as a whole and I assess the current state of the trial by the value of this scalar (in these intermediate evalutaions). It takes some time for a longer metric to be available but if they (metrics) will cover different number of cohorts there will be a bias; and on top of that the stream of data is not stationary, the later cohorts counted in the shorter metrics spoil the covariations that exist (and are visible) in the first cohort alone.
Yes, I understand your vision of tying everything to an Arm/parameterization. What you are suggesting makes sense, but the non-stationarity (forgive me my 'French') again will spoil the resulting distribution related to this 'same' arm and for sure will kill the (multiple, not just pair) covariations.
Sorry for bothering you with this, but I'm sure other people struggling with real time experiments will bump into these problems too.

@lena-kashtelyan lena-kashtelyan self-assigned this Apr 1, 2022
@lena-kashtelyan lena-kashtelyan added the wishlist Long-term wishlist feature requests label Apr 1, 2022
@lena-kashtelyan
Copy link
Contributor

Hi @alxfed, thank you for the useful suggestion! We'll put it on our wishlist for now, but in the meantime it seems you have a simple workaround of writing to trial properties, so that's great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wishlist Long-term wishlist feature requests
Projects
None yet
Development

No branches or pull requests

4 participants