Add Ensembles and integrate AutoEmulate object

In our context, an ensemble, which is constructed from a set of emulators, can be viewed as an emulator itself, where the uncertainty is aggregated from the emulators it holds. Both the AutoEmulate object and an ensemble operate over a set of emulators and have some overlapping functionalities (e.g. emulator fitting) and some separate functionalities (e.g. emulator comparison). Overall, I think `AutoEmulate` can be cast as a subclass of `Ensemble`, but happy to discuss this point. See elaboration below.

**Ensemble**
`Emulator.predict(x)` returns `(mean, covariance)`, possibly different ones for multiple calls over a given `x` and possibly with a zero `covariance`. Let `M` be the number of emulators in the ensemble and `N` be the number of samples (calls) per emulator. Then the `(mean, covariance)` of the ensemble would be

$$
\mu_{\mathrm{ens}}
= \frac{1}{M N}
  \sum_{i=1}^{M}\sum_{j=1}^{N}
    \mu_i^{(j)},
$$
$$
\Sigma_{\mathrm{ens}}
= \frac{1}{M N}
  \sum_{i=1}^{M}\sum_{j=1}^{N}
    \Sigma_i^{(j)}
+
  \frac{1}{M N - 1}
  \sum_{i=1}^{M}\sum_{j=1}^{N}
    \bigl(\mu_i^{(j)} - \mu_{\mathrm{ens}}\bigr)
    \bigl(\mu_i^{(j)} - \mu_{\mathrm{ens}}\bigr)^{T}.
$$

where the first term of $\Sigma_{\mathrm{ens}}$ is the aleatoric uncertainty and the second is the epistemic uncertainty. One choice for the disagreement score for query-by-committee is $\mathrm{trace}(\Sigma_{\mathrm{ens}})$. This should hold up even if some of the emulators in the ensemble are ensembles themselves. Also, weights could be assigned for the summations above to place priorities on certain emulators.

**AutoEmulate/Ensemble**

Both objects share functionalities, e.g. training/fitting emulators. Perhaps, we should break up some of `AutoEmulate`'s methods to:
1.  maximise the number of methods shared with `Ensemble` through inheritance, e.g., pulling training/fitting out of `AutoEmulate.compare` and sharing `AutoEmulate.fit(X, Y)` and `Ensemble.fit(X, Y)`;
2. and minimize new methods specific to `AutoEmulate`.

**Psuedo-code**

```python
class Emulator:
  def predict(self, x):
    ..
    return mean, covariance

@dataclass
class Ensemble(Emulator):
  # these emulators could be ensembles
  emulators: List[Emulator]
  # number of samples per emulator
  # maybe this should belong to the emulator?
  n_samples: List[int]

  def predict(self, x):
    # compute ensemble mean and covariance
    # from emulators as in above formula
    return mean, covariance

  def score(self, x):
    # disagrement score for query-by-committe
    # possibly should be in the active learning method instead
    _, covariance = self.predict(x)
    return torch.trace(covariance)

  def fit(X, Y):
    # fit all emulators in parallel
    ..

class AutoEmulate(Ensemble):
  # inherits attributes/methods from Ensemble

  def compare(self, ..):
    # use ensemble method
    self.fit(x)
    # then do cross-validation, etc.
    ..

  # NOTE: some ensemble methods might not be used

  def other_autoemulate_methods(..):
    ..

```

Parallelization of emulator training relates to #421.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ensembles and integrate AutoEmulate object #429

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Ensembles and integrate AutoEmulate object #429

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions