-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reparameterizing compositional linear equality constraints as linear inequality constraints introduces Sobol sampling bias #903
Comments
xref: #786 |
I went back to the issue and it seems what you're really trying to do is sample from the unit simplex here? The bias in the sampling is a known issue with the kind of approach you tried out. In BoTorch we actually implement proper sampling form the d-simplex: https://github.com/pytorch/botorch/blob/main/botorch/utils/sampling.py#L270. If the parameters don't appear in other constraints then you can just sample the other dimensions independently and combine the samples (note that that will destroy the quasi-random low-discrepancy structure, but that's probably fine for initialization; doing this kind of QMC sampling properly for non-box domains turns out to be hard / unsolved). If they do appear in other constraints then you can build a box with samples of the other parameters by adding dims to the components and then do rejection sampling based on the constraints. Hooking this up into Ax would be somewhat challenging / require a good amount of effort since essentially we need to either automatically parse the constraints to infer that this is what the random initial step should be doing, or we have to introduces some new high level abstractions for specifying these constraints (in terms of the I think the easiest short term fix would be to just manually generate the initial random points using this Does this make sense? |
@Balandat This is really helpful, thank you! It's nice to see There are cases in materials science where it may not necessarily be a unit simplex (e.g. application of domain knowledge that |
You can also give [DelaunayPolytopeSampler]9https://github.com/pytorch/botorch/blob/21ce6c7fa9fa907674c37b849e2e5dc683ca2682/botorch/utils/sampling.py#L697-L715) a try - This does uses a pretty cool algorithm to uniformly sample from a general Polytope (it supports equality constraints as well) by subdividing the whole polytope into primitive shapes and then uses their volume to build a two-stage sample process. There is some expensive upfront computation (the computation of the convex hull) but if you need lots of samples this can be worth it. Note though that the complexity grows quickly here so if you are in higher dimensions or have lots of complex constraints this can quickly get intractable. |
Closing this issue as inactive; @sgbaird, please reopen it if you do follow up! |
@Balandat follow-up question: Any recommendations for if there are categorical parameters in the search space? Draw randomly? Convert to some numerical representation with equal distance between choices? |
Hmm good question. Sobol (and QMC methods in general) don't really play well with non-box bounds and non-continuous parameters. I would recommend just drawing those parameters independently at random and then stitch those together with the sobol samples of the rest of the space (or just do everything uniformly normal if you don't care too much about the low-discrepancy space filling properties of Sobol). You can also map the categoricals to integers and then use the approach from https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.Sobol.integers.html#scipy.stats.qmc.Sobol.integers to get qMC-ish integer draws and then map those back to the categorical parameters. |
@Balandat thanks! I wasn't aware of the integers method from
|
So one thing to note is that if you define the parameter as an integer in Ax itself, we'll try to model it as a numerical parameter; i.e. we'll assume that distance in the parameter values is related to distance in function values. Which is probably not what you want want for the modeling if your parameter are truly categorical (and unordered). So you could generate your custom initial random arms externally that doesn't involve the surrogate modeling, and then go from there in Ax. Or you could alternatively just use random sampling rather than Sobol sampling if the parameter constraints bias the sampling otherwise. |
Good catch, thanks!
External generation and attaching manually seems like the way to go. Thank you! There's one thing I want to clarify. Can the parameter constraints bias the Sobol sampling because Sobol points are generated within the unconstrained design space and then limited to points within the feasible design space via a rejection strategy? (meaning you might get large, systematic gaps along certain "faces") I imagine that the |
I've been struggling with this concept for a week or two and decided to surface this to the Ax devs in a fresh issue to get some suggestions and as a sanity check.
@bernardbeckerman offered a very useful comment related to removing degenerate search space dimensions that I've been implementing:
I described the issue related to a bias in Sobol sampling in #727 (comment):
To illustrate, the bias in Sobol sampling towards the first two parameters might look something like the following, made-up data:
In #727 (comment), I mentioned that retaining the original linear equality parameterization is probably the only way to prevent a bias in the Sobol sampling (and to some extent, possibly in the Bayesian iterations); however, this requires passing directly to BoTorch and manually implementing the underlying transforms #769 (comment) @dme65.
For simple use-cases where anything/everything is allowed to range from
[0, 1]
, it might be OK to ignore the transforms. For more complicated cases that I'm dealing with where I have different bounds for parameters (e.g.[0.1, 0.25]
) or the constraint isn't relative to1.0
(e.g.x_1 + x_2 <= 0.25 && x_1 + x2 >= 0.15
), I wonder if this will start causing issues.The text was updated successfully, but these errors were encountered: