Skip to content

Conversation

cisprague
Copy link
Collaborator

@cisprague cisprague commented Jun 23, 2025

We now have a working prototype of ensembles in a tutorial notebook. Further development will result in complexity being transitioned from the tutorial notebook to the codebase.

Features:

  • Ensemble of emulators (e.g. MLPs or other ensembles).
  • Dropout stochastic emulator.

Other things:

  • When calling ensemble.fit, it would be nice to see some feedback on the progress of emulator.fit for each of the emulators in the ensemble. Defining a custom __repr__ or casting all emulators as dataclasses are options for this. Dataclasses are helpful for this (especially the recursion needed to traverse all sub emulators).
  • Need to think if iter-sample covariance is needed, like in GPs.
  • The MLP defined in the tutorial notebook should be replaced by the one resulting from Add MLP #559.

@cisprague cisprague linked an issue Jun 23, 2025 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov-commenter
Copy link

codecov-commenter commented Jun 23, 2025

Codecov Report

Attention: Patch coverage is 95.16129% with 9 lines in your changes missing coverage. Please review.

Project coverage is 79.03%. Comparing base (439b42f) to head (e392da8).
Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
autoemulate/experimental/emulators/ensemble.py 94.54% 6 Missing ⚠️
autoemulate/experimental/learners/base.py 0.00% 2 Missing ⚠️
autoemulate/experimental/emulators/base.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #560      +/-   ##
==========================================
+ Coverage   78.43%   79.03%   +0.59%     
==========================================
  Files         132      138       +6     
  Lines        9396     9749     +353     
==========================================
+ Hits         7370     7705     +335     
- Misses       2026     2044      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

github-actions bot commented Jun 23, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  autoemulate/emulators
  gaussian_process.py
  autoemulate/experimental/emulators
  __init__.py
  base.py
  autoemulate/experimental/learners
  base.py
  tests
  test_compare.py
  tests/experimental
  test_experimental_transformed.py
Project Total  

This report was generated by python-coverage-comment-action

@cisprague cisprague changed the title Add ensembles. Add ensembles Jun 25, 2025
@sgreenbury
Copy link
Collaborator

* When calling `ensemble.fit`, it would be nice to see some feedback on the progress of `emulator.fit` for each of the emulators in the ensemble. Defining a custom `__repr__` or casting all emulators as dataclasses are options for this. Dataclasses are helpful for this (especially the recursion needed to traverse all sub emulators).

This sounds great, though I am not sure if we can make emulators dataclasses since the way we are subclassing e.g. nn.Module, ExactGP and others?

* Need to think if iter-sample covariance is needed, like in GPs.

It would be good to discuss whether to have this feature! If it is, updating to MultitaskMultivariateNormal (GaussianLike in the types) should work for this. Also relates to the #471.

* The MLP defined in the tutorial notebook should be replaced by the one resulting from [Add MLP #559](https://github.com/alan-turing-institute/autoemulate/pull/559).

Sounds good! I've updated #559 to have dropout as an option (but only during training) - it would be good to discuss the stochastic emulator case at inference time.

@sgreenbury
Copy link
Collaborator

Implementation of MC dropout

@sgreenbury
Copy link
Collaborator

From discussion with @radka-j @cisprague:

  • We can aim to include the MC dropout (as a new issue/PR) following Add MLP #559
  • Inter-sample covariance here we do not need to capture
  • Updating the GaussianLike to the torch one with a new type alias GaussianProcessLike for the MultitaskMultivariateNormal seems a good option (we can update this here but might be worth having in a distinct PR too in case any wider comptability issue)
  • We can open a new issue to look at printing an Emulator (perhaps similar to keras print_summary()
  • We might be able to add a method on the GaussianEmulator from Add more emulator subclasses #561 that produces a distribution from samples

@sgreenbury
Copy link
Collaborator

I think it might be worth using the GPyTorch MultivariateNormal here (instead of the torch one) since it subclasses the torch MultivariateNormal but also supports the linear_operator package for specifying the covariance matrix.

I think the functionality/API should be very similar though I noticed that the .sample() seems to work differently and it doesn't seem possible to pass a shape as an argument.

@cisprague cisprague marked this pull request as ready for review July 2, 2025 08:18
@cisprague cisprague requested review from radka-j and sgreenbury July 2, 2025 08:18
@cisprague
Copy link
Collaborator Author

cisprague commented Jul 2, 2025

The following have been completed (along with corresponding tests):

  1. ensemble of emulators (separate models),
  2. ensemble with dropout (same model, different dropout masks).

Both have been demonstrated in a tutorial notebook, as well as their integration with active learning. The notebook is in draft form and will be further developed after receiving feedback.

Regarding the failed precommit, it passes locally. Advice on how to resolve this would be appreciated.

cisprague and others added 2 commits July 3, 2025 09:16
Mostly related to CI.

Co-authored-by: Sam Greenbury <[email protected]>
@cisprague
Copy link
Collaborator Author

In DropoutEnsemble we accept a model : PyTorchBackend, but we are implicitly assuming that it has some dropout functionality. AFAIK, passing in a model without dropout would not break anything, but then each forward pass would be identical. Any idea around this?

@sgreenbury
Copy link
Collaborator

In DropoutEnsemble we accept a model : PyTorchBackend, but we are implicitly assuming that it has some dropout functionality. AFAIK, passing in a model without dropout would not break anything, but then each forward pass would be identical. Any idea around this?

An option might be to have a subclass DropoutTorchBackend? Subclassers could choose this one if the subclassed model will have dropout.

class DropoutTorchBackend(PyTorchBackend): ...

Copy link
Collaborator

@sgreenbury sgreenbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great @cisprague, thanks! Just a couple of small comments as discussed above and a new issue to be opened for further revisions to the notebook but otherwise looks good to merge.

@cisprague cisprague merged commit 9ff6645 into main Jul 4, 2025
3 checks passed
@sgreenbury sgreenbury deleted the 429-add-ensembles-and-integrate-autoemulate-object branch July 4, 2025 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Ensembles and integrate AutoEmulate object

3 participants