Add ensembles #560

cisprague · 2025-06-23T12:19:19Z

We now have a working prototype of ensembles in a tutorial notebook. Further development will result in complexity being transitioned from the tutorial notebook to the codebase.

Features:

Ensemble of emulators (e.g. MLPs or other ensembles).
Dropout stochastic emulator.

Other things:

When calling ensemble.fit, it would be nice to see some feedback on the progress of emulator.fit for each of the emulators in the ensemble. Defining a custom __repr__ or casting all emulators as dataclasses are options for this. Dataclasses are helpful for this (especially the recursion needed to traverse all sub emulators).
Need to think if iter-sample covariance is needed, like in GPs.
The MLP defined in the tutorial notebook should be replaced by the one resulting from Add MLP #559.

review-notebook-app · 2025-06-23T12:19:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2025-06-23T12:32:49Z

Codecov Report

Attention: Patch coverage is 95.16129% with 9 lines in your changes missing coverage. Please review.

Project coverage is 79.03%. Comparing base (439b42f) to head (e392da8).
Report is 49 commits behind head on main.

Files with missing lines	Patch %	Lines
autoemulate/experimental/emulators/ensemble.py	94.54%	6 Missing ⚠️
autoemulate/experimental/learners/base.py	0.00%	2 Missing ⚠️
autoemulate/experimental/emulators/base.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #560      +/-   ##
==========================================
+ Coverage   78.43%   79.03%   +0.59%     
==========================================
  Files         132      138       +6     
  Lines        9396     9749     +353     
==========================================
+ Hits         7370     7705     +335     
- Misses       2026     2044      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-06-23T12:33:11Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
autoemulate/emulators
gaussian_process.py
autoemulate/experimental/emulators
__init__.py
base.py
autoemulate/experimental/learners
base.py
tests
test_compare.py
tests/experimental
test_experimental_transformed.py
Project Total

_{This report was generated by python-coverage-comment-action}

sgreenbury · 2025-06-25T12:55:04Z

* When calling `ensemble.fit`, it would be nice to see some feedback on the progress of `emulator.fit` for each of the emulators in the ensemble. Defining a custom `__repr__` or casting all emulators as dataclasses are options for this. Dataclasses are helpful for this (especially the recursion needed to traverse all sub emulators).

This sounds great, though I am not sure if we can make emulators dataclasses since the way we are subclassing e.g. nn.Module, ExactGP and others?

* Need to think if iter-sample covariance is needed, like in GPs.

It would be good to discuss whether to have this feature! If it is, updating to MultitaskMultivariateNormal (GaussianLike in the types) should work for this. Also relates to the #471.

* The MLP defined in the tutorial notebook should be replaced by the one resulting from [Add MLP #559](https://github.com/alan-turing-institute/autoemulate/pull/559).

Sounds good! I've updated #559 to have dropout as an option (but only during training) - it would be good to discuss the stochastic emulator case at inference time.

sgreenbury · 2025-06-25T13:25:34Z

Implementation of MC dropout

sgreenbury · 2025-06-25T14:15:20Z

From discussion with @radka-j @cisprague:

We can aim to include the MC dropout (as a new issue/PR) following Add MLP #559
Inter-sample covariance here we do not need to capture
Updating the GaussianLike to the torch one with a new type alias GaussianProcessLike for the MultitaskMultivariateNormal seems a good option (we can update this here but might be worth having in a distinct PR too in case any wider comptability issue)
We can open a new issue to look at printing an Emulator (perhaps similar to keras print_summary()
We might be able to add a method on the GaussianEmulator from Add more emulator subclasses #561 that produces a distribution from samples

sgreenbury · 2025-06-30T07:53:05Z

I think it might be worth using the GPyTorch MultivariateNormal here (instead of the torch one) since it subclasses the torch MultivariateNormal but also supports the linear_operator package for specifying the covariance matrix.

I think the functionality/API should be very similar though I noticed that the .sample() seems to work differently and it doesn't seem possible to pass a shape as an argument.

…to learner to get notebook to work.

cisprague · 2025-07-02T08:22:11Z

The following have been completed (along with corresponding tests):

ensemble of emulators (separate models),
ensemble with dropout (same model, different dropout masks).

Both have been demonstrated in a tutorial notebook, as well as their integration with active learning. The notebook is in draft form and will be further developed after receiving feedback.

Regarding the failed precommit, it passes locally. Advice on how to resolve this would be appreciated.

autoemulate/experimental/learners/stream.py

tests/experimental/test_experimental_ensemble.py

autoemulate/experimental/learners/base.py

autoemulate/experimental/emulators/ensemble.py

Mostly related to CI. Co-authored-by: Sam Greenbury <[email protected]>

cisprague · 2025-07-03T08:38:44Z

In DropoutEnsemble we accept a model : PyTorchBackend, but we are implicitly assuming that it has some dropout functionality. AFAIK, passing in a model without dropout would not break anything, but then each forward pass would be identical. Any idea around this?

sgreenbury · 2025-07-04T09:46:53Z

In DropoutEnsemble we accept a model : PyTorchBackend, but we are implicitly assuming that it has some dropout functionality. AFAIK, passing in a model without dropout would not break anything, but then each forward pass would be identical. Any idea around this?

An option might be to have a subclass DropoutTorchBackend? Subclassers could choose this one if the subclassed model will have dropout.

class DropoutTorchBackend(PyTorchBackend): ...

Co-authored-by: cisprague <[email protected]>

docs/tutorials/08_ensembles.ipynb

sgreenbury

This looks really great @cisprague, thanks! Just a couple of small comments as discussed above and a new issue to be opened for further revisions to the notebook but otherwise looks good to merge.

… functionality.

Added working prototype of ensembles.

b1f42fa

cisprague linked an issue Jun 23, 2025 that may be closed by this pull request

Add Ensembles and integrate AutoEmulate object #429

Closed

cisprague changed the title ~~Add ensembles.~~ Add ensembles Jun 25, 2025

sgreenbury mentioned this pull request Jun 25, 2025

Add MLP subclass with MC dropout #571

Closed

Merge with main

b69fb14

sgreenbury mentioned this pull request Jun 30, 2025

Add more emulator subclasses #561

Merged

cisprague added 7 commits June 30, 2025 12:47

Added ensembles to experimental, updated notebook, and small changes …

09bf046

…to learner to get notebook to work.

Merge with main

4390379

Updated Ensemble to be GaussianEmulator

fb2a21c

Updated tutorial notebook typos.

773c1aa

Fixed formating for ruff.

b0c5be1

Add dropout ensembles and merge with main

6226dfd

Fixed format for ruff

d307b5b

cisprague marked this pull request as ready for review July 2, 2025 08:18

cisprague requested review from radka-j and sgreenbury July 2, 2025 08:18