-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Number of trials and batches in offline optimization + tracking issue to learn of experiment outcomes #1562
Comments
Hi Eric thank you for reaching out. The number of trials Ax needs to converge on an optimal parameterization can very depending on the specifics of the experiment. In general our default method GPEI can optimize up to a dozen parameters in about sequential 30 trials, but the best way to validate this on your specific problem is checking to see if the optimization has plateaued via and checking the cross validation plot to ensure the model fit is good. You can read more about this here https://ax.dev/tutorials/gpei_hartmann_service.html and here https://ax.dev/tutorials/visualizations.html . If you have any more specific details (how many parameters you are optimizing over, how long an offline trial takes, how long your experiment can take, any of these or other plots, etc) we may be able to provide more specific advice for your particular experiment. cc @Balandat |
Hi !
Evaluation of a paint formulation For each objective we test paint 3 time to have the mean and SEM. We have relatively high SEM because we working timber which is an anisotropic material. Very expensive objective evaluation The evaluation of objectives is very expensive in terms of time (3 months to evaluation 20 trials on 3 different fire bench) Number of parallel trial and max point beyond initial point It is easier for us the lab to evaluate at least 3 trials at the same time Max point beyond the initial point : less than 15 because very expensive evaluation. |
Hi Eric, this sounds like a very interesting application! If you have 7 objectives, you may wish to come up with some linear scalarization of these objectives, or use preference exploration to learn an objective function after you collect data from your initial batch. There is a tutorial (along with a link to the paper) for how to do this in BoTorch here: https://botorch.org/tutorials/bope. @ItsMrLin might have pointers for how to do this in Ax if you are interested (assuming the code is public...). When you say "With 3 run for each points to know the SEM", are you saying you compute the SEM off of 3 observations? This would be far too few observations to compute a SEM and you may be better off inferring the noise. IIRC @SebastianAment has been exploring what are some good approaches to use when you only a very limited number of replications per design. @sgbaird might have some helpful suggestions for how to most efficiently encode your design space (see e.g., #727 ). |
Here the first version of the code based on our data |
Yes this is indeed a super interesting application. I second Eytan's point about 3 observations being too few to compute meaningful estimates of the standard error. So if it's the same cost to just try a different composition as it is to try the same composition 3 times, then I would recommend trying more compositions and letting the model infer the noise level. Also, 7 objectives is quite a lot for hypervolume type methods. But it could work in the setting where there are very few trials and we can spend a good amount of time on computing and optimizing the acquisition function. cc @sdaulton for thoughts / tricks on this one. I think the main challenge you'll have is interpreting the results in a 7-dim outcome space. As Eytan said, you have some domain expertise (and access to people with domain expertise) who feel like the can compare the relative value of two compositions (each with 7 outcomes) in a reasonable fashion, this could be a prime use case for preference learning. |
Hi Alex ! Thank you for your reply! Ok maybe we will let the model infer the noise level. And to come back to the number of parallel trials, to evaluate a paint formulation it's take time because the lab is located in different places in France so it is more easy to have the maximum parallel trials as possible. If the total of evaluation budget beyond the initial point is 15 what is the best strategy : 3*5 parallel , 10 parallel +2 parallel or once single batch of 15 parallel trials ? PS : if you want to know more about our lab (https://umet.univ-lille.fr/Polymeres/index.php?lang=en) |
This is hard t o answer in general; really depends on how you value wall time vs. optimization performance and how amenable the problem is to parallelization. The most efficient use of information would be to do all trials fully sequentially. At the other extreme if you have one single large parallel batch you basically can't learn anything from the results. In between there are tradeoffs; for some problems you might be ok going with larger batch sizes, for other problems the performance can degrade quite a bit. Since there are so few trials I would be careful not to use too large a batch size - I would advise to start from how many batches you can afford to do (time-wise) and then run the maximum batch size you can do. I don't know how you're collecting the results, but you could also use asynchronous optimization, where you generate a new candidate whenever you get some information back - this would really only make sense if the results from the batches come back at different times (e.g. b/c some lab is slower than another one). |
@eytan, thanks for the cc. I'm back from a long vacation and getting back into the swing of things.
I suggest using the linear equality --> linear inequality reparameterization mentioned in
I think objective thresholds are very important here. I suggest picking outcome constraints based on domain knowledge. These can be chosen by asking the following question for each of your objectives: For objective A, if all other objectives had amazing values, what is the worst allowable/viable value for objective A from an application standpoint? Phrased conversely, what value of objective A would make the material inviable in spite of great performance for the other objectives? Then give yourself something like a 10% tolerance on this outcome. For example, if you're minimizing objective A, and the maximum allowable value is 1.0, then set the outcome constraint to something like Pulling from the Ax multi-objective optimization tutorial:
See also: You might also consider reformulating this as a constraint satisfaction problem (constraint active search) #930 (comment), but I'll defer to the devs (cc @eytan) for whether this seems like a good fit.
Seconded, but again depends on your setup. Do you mind providing some estimates of the total time and cost of running experiments with different batch sizes? The costs can be relative (e.g., 0.0 is low-cost, 1.0 is high-cost) and should incorporate the cost of the user's time. The total time refers to how long it takes to go from start to finish of the batch experiment. Aside: asynchronous + multi-objective is non-trivial to implement #896. Do you have a workflow figure for the synthesis and characterization equipment? |
Ok, maybe 5 batches of 3 parallels tests can be a good trade-off.
Since I evaluate the flammability of the materials with 3 different fire bench and one of them is located in over place in France so I need to send the sample to this lab. Bigger batch size can help us to save lot of time because one of the test is much slower to get evaluation. When I run several time my code on the same initial data, the algorithm do not give always the same value as "next experiement", is it normal ? And how this can be explained ? |
Yes, this is to be expected - both fitting the models and optimizing the acquisition functions involves solving non-convex optimization problems, so it's possible to get stuck in local minima. We do a good amount of work under the hood with random restarts etc to make that less likely, but it can still happen. This doesn't necessarily need to be a bad thing. The no-determinism comes from the randomized initialization that we use for multi-start gradient descent to solve these problems. It is possible to pass a |
Noise inference After a discussions with @sgbaird, I was wondering if is better to let the system infer the noise or to run the code with all data available (3 repetitions for each formulations) ? Mixture design constrain I facing a mixture design problem so A+B+C+D =1 Next experience When I running the code with 5 parallel trials, almost the same experiment is suggested and the sum xi≠ 1, (this issue the last point) #1635 Data for the next experiment suggested :
For more detail the google collab code |
Hmm I'm not sure I understand this question - You'd want to use all the data that is available, in which case the model can use the repeated observations to better infer the noise level.
Not really, this is essentially a hack around the fact that we're not exposing exact equality constraints in the Ax APIs (Feature Request is here: #510). Doing so is going to cause issues as @sgbaird observed here: #510 (comment). The proper thing to do would in fact be to hook equality constraints up to Ax, but we unfortunately haven't had the bandwidth to work on this. But if you or someone else wanted to take a stab we'd be happy to help :)
Hmm this is interesting; it may be that this is a result of the above hack; so I'd want to address that first before digging much deeper into this. |
Ok ! By using the reparameterization as an inequality constraint (make one variable "hidden") make more sense then.
For the first step I have 20 points, with 3 runs (plot bellow), does make any sense to train the model by using all point so 60 points or train the model just by using the mean (20 points) and SEM ? |
Either should be fine in this case. If you use the raw points you won't have any variance estimates on the individual observations so we'd have the model infer a noise level. The variance looks relatively consistent across your evaluations so that should work fine. But alternatively you can also pass in the means and SEMs. Doing this would be useful if your noise had high heterodsekdasticity - i.e. vary across the different points. The error bars may suggest this, but as Eytan mentioned above the main issue here is that the SEM estimate itself is going to have very high variance with only 3 observations, so this may itself just be noise in the data. |
Ok ! Thank you for your comment. It's time for me to go to the lab and test this actings learn loop on my coating. I will let you know the results. |
@Eric-verret excited to hear how it goes! |
Looking forward to learning how this went @Eric-verret : ) |
Hello ! I'm still in the lab, I will have the results of the first loop at the end of august (after the summer brake of Uni) ! |
cc @eytan, @Balandat, @esantorella (current M&O oncall) |
Closing as inactive, but feel free to keep commenting or reopen as needed. |
Hello Everyone,
I'm working on the developpement/Optimization of a new fire retardent paint by using Ax API and I would like to know if you have any advice for the number of trials by batch. My total number of observation beyond initial points is 9. And it's more simple for us to formulate/evaluate the performance in the lab by at least batches of 2 trials.
The text was updated successfully, but these errors were encountered: