You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below is my view on what AD in Turing 1.0 ought to look like. Please feel free to comment / add your own thoughts -- I'll update this statement in light of new items.
This issue should make clear what the context / background is for the problem, detail what steps need to be taken to make progress, and make it clear what it will take to make this issue as being closed. The context of this at any given point in time reflects what we currently believe to be true, and is subject to change.
Note: this issue is only half done -- I still need to discuss performance.
Summary
There are two main questions to ask about a given AD on a given Turing.jl model:
does it run (correctly)?
is it performant?
In 1.0, we want to be able to be able to make fairly confident statements about the kinds of models that AD works on -- this must be achieved through testing.
Similarly, we want to be able to make quantitative statements about the performance a user should expect from a given AD, and give them advice for debugging if it appears to be slow.
Testing: does it run?
In order to be confident that we have reasonable support in a large range of cases, we need to
define roughly what it is that we want to support,
know what we currently do / do not test + fill in the gaps, and
ensure that the test cases get run in the right places.
1. Rough Support Requirements:
This is the thing that I have the least strong opinions on. Certainly, we want to test all of the varinfos, every Distributions.Distribution that we care about in at least one model, and all of the various bits of syntax which DynamicPPL.jl exposes to the user.
2. Existing test cases for AD and where we run them:
We have some of these in DynamicPPL.TestUtils.DemoModels. and (my understanding is) that they're quite good at checking that you can differentiate a very simple Turing.jl model (specifically, one comprising a single distribution, but implemented in a range of ways).
AD backends are testing in DynamicPPLhere.
This loops over the combination of each AD backend, each element of DemoModels, and each varinfo.
So I get the impression we have moderate coverage of DynamicPPL features, good coverage of the various varinfos, for each AD tested. This testing happens inside Turing.jl.
3. Ensuring Testing Happens
There are three things which should happen in order to give us a reasonable degree of confidence in AD:
we define a collection of models which we want to be able to differentiate,
we run the tests for these in one of the TuringLang repos, making sure to test the thing that users actually call, and
we derive from this collection of models a collection of (f, args...), which we can pass to AD backends and say "hey, this is our current best guess at what you need to be able to differentiate if you want to support Turing.jl. If you want to ensure support for Turing.jl, just run these as part of your integration tests in your CI, and make sure that you can differentiate them correctly and quickly".
Note that it is not sufficient to only do one of 2 or 3, as they each serve slightly different purposes.
2 is necessary because ultimately we are the ones who want to be sure that AD works for our users, and to know what current does not work. In particular, if we change something in Turing.jl which causes AD-related problems, we want to know about them before merging them. Knowing about them, we can either change our implementation to play nicely with the AD having problems, or open an upstream issue if an AD fails to differentiate something that we think it ought really be able to differentiate.
3 is necessary because AD authors will often change internals in their packages. Hopefully their unit tests will catch most problems before they release changes, but there is really no substitute for having a very large array of real test cases to provide something like fuzzing / property testing for your AD. From Turing.jl's perspective, having our test cases being run as part of the CI for the ADs that we care about ensures a better experience for our users.
@penelopeysm has made a start on a more general package https://github.com/penelopeysm/ModelTests.jl/, which aims to systematise testing a bit more thoroughly and provide test cases for use by external packages (correct me if I'm wrong Penny). From my perspective, it goes about this in exactly the right way. In particular:
DynamicPPL.jl can just use the ad_ldp or the ad_di function to turn models into test cases, while
AD backends, such as Mooncake, can hook into make_function and make_params.
Performance
I will finish this section off another day.
Concrete Todo items:
decide where we want to keep this testing infrastructure. In particular, do we keep them in ModelTests and move this package into the Turing org, or locate the functionality from ModelTests inside DPPL.jl itself. Discussion here
extend testing functionality to permit us to manually flag test cases as "broken" on a particular backend
decide what additional test cases we want to add, and add them.
detail the "Performance" section of this issue (me)
make use of testing infrastructure in the DynamicPPL test suite (if it stays in DPPL, there may be nothing to do here)
make use of testing infrastructure in the Mooncake test suite (for me to do)
start discussions with other AD backends about incorporating our test suite in their integration tests
decide where we want to keep this testing infrastructure. In particular, do we keep them in ModelTests and move this package into the Turing org, or locate the functionality from ModelTests inside DPPL.jl itself.
what is the answer to the first concrete todo item?
I've started a discussion on this single point here: #2412
Below is my view on what AD in Turing 1.0 ought to look like. Please feel free to comment / add your own thoughts -- I'll update this statement in light of new items.
This issue should make clear what the context / background is for the problem, detail what steps need to be taken to make progress, and make it clear what it will take to make this issue as being closed. The context of this at any given point in time reflects what we currently believe to be true, and is subject to change.
Note: this issue is only half done -- I still need to discuss performance.
Summary
There are two main questions to ask about a given AD on a given Turing.jl model:
In 1.0, we want to be able to be able to make fairly confident statements about the kinds of models that AD works on -- this must be achieved through testing.
Similarly, we want to be able to make quantitative statements about the performance a user should expect from a given AD, and give them advice for debugging if it appears to be slow.
Testing: does it run?
In order to be confident that we have reasonable support in a large range of cases, we need to
1. Rough Support Requirements:
This is the thing that I have the least strong opinions on. Certainly, we want to test all of the varinfos, every
Distributions.Distribution
that we care about in at least one model, and all of the various bits of syntax which DynamicPPL.jl exposes to the user.2. Existing test cases for AD and where we run them:
DynamicPPL.TestUtils.DemoModels
. and (my understanding is) that they're quite good at checking that you can differentiate a very simple Turing.jl model (specifically, one comprising a single distribution, but implemented in a range of ways).DynamicPPL
here.This loops over the combination of each AD backend, each element of
DemoModels
, and each varinfo.So I get the impression we have moderate coverage of DynamicPPL features, good coverage of the various varinfos, for each AD tested. This testing happens inside Turing.jl.
3. Ensuring Testing Happens
There are three things which should happen in order to give us a reasonable degree of confidence in AD:
(f, args...)
, which we can pass to AD backends and say "hey, this is our current best guess at what you need to be able to differentiate if you want to support Turing.jl. If you want to ensure support for Turing.jl, just run these as part of your integration tests in your CI, and make sure that you can differentiate them correctly and quickly".Note that it is not sufficient to only do one of 2 or 3, as they each serve slightly different purposes.
2 is necessary because ultimately we are the ones who want to be sure that AD works for our users, and to know what current does not work. In particular, if we change something in Turing.jl which causes AD-related problems, we want to know about them before merging them. Knowing about them, we can either change our implementation to play nicely with the AD having problems, or open an upstream issue if an AD fails to differentiate something that we think it ought really be able to differentiate.
3 is necessary because AD authors will often change internals in their packages. Hopefully their unit tests will catch most problems before they release changes, but there is really no substitute for having a very large array of real test cases to provide something like fuzzing / property testing for your AD. From Turing.jl's perspective, having our test cases being run as part of the CI for the ADs that we care about ensures a better experience for our users.
@penelopeysm has made a start on a more general package https://github.com/penelopeysm/ModelTests.jl/, which aims to systematise testing a bit more thoroughly and provide test cases for use by external packages (correct me if I'm wrong Penny). From my perspective, it goes about this in exactly the right way. In particular:
ad_ldp
or thead_di
function to turn models into test cases, whilemake_function
andmake_params
.Performance
I will finish this section off another day.
Concrete Todo items:
ModelTest
s and move this package into the Turing org, or locate the functionality fromModelTests
inside DPPL.jl itself. Discussion hereLinked Issues / PRs:
Questions:
The text was updated successfully, but these errors were encountered: