-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of parallel state machines #72
base: main
Are you sure you want to change the base?
First draft of parallel state machines #72
Conversation
Generation of parallel actionsThe way I'm generating parallel actions is somewhat naive. I generate a list of normal I check that every possible interleaving of actions in those parallel suffixes preserves every precondition. However
Also if generation fails, I just try again. I also currently don't have a shrinker :( Combination of environmentsWhen I'm generating all possible interleavings of executions, I can for example face this parallel sequence with the noted results after execution:
For now I'm concat'ing all the environments because I assume the
|
@jasagredo I have started having a look and playing with your PR, which lead me to investigate how this feature was implemented in q-s-m.
Perhaps a good first approach would be to try to port the q-s-m code? |
A comment on the number of parallel threads: some of the most interesting bugs I've found have required three threads--see https://dl.acm.org/doi/abs/10.1145/2034654.2034667 for example. So I think it would be a real shame to restrict parallel tests to two threads. I can imagine adding a Fork action (with a list of actions as a parameter) to express an arbitrary number of threads. If Fork returns a thread ID, one could even imagine Join ThreadId actions that wait for a forked thread to terminate. The nice thing about adding Fork and Join actions is that they could be used in DL tests too: imagine a DL test of the form anyActions; Fork anyActions; , for example, that would test that the DL sequence worked in any reachable state, with anything going on in parallel. There is a potential problem with a combinatorial explosion of interleavings, but that can be addressed during generation just by not generating test cases with too many interleavings--the number of interleavings is easily (and cheaply) predictable from a test case, so one can just stop generating before the number gets too big (e.g. 100 or 1000). e.g. Fork [a,b]; Fork [c,d]; e; f has (6 Shrinking is really important, of course, and there are useful shrinking steps for parallel tests (in addition to shrinking or just dropping actions) that reduce parallelism, making failed tests easier to debug. One: move the first action out of a Fork. Two: move a Fork later (past the next non-Fork action, to ensure we reduce the set of possible interleavings, which is necessary to avoid shrinking loops). |
There's a problem as is with the type of postconditions (PostconditionM). In this version, postconditions can interact with the system under test, and indeed, this is used in several places in IOG's code to read from the system state in the postcondition. But in parallel testing, postconditions cannot be evaluated while the test is run, instead one must explore many possible interleavings after the test has finished, which means each postcondition may run many times, and in the final state of the test, not in the state after the action it applies to. If model authors expect the postcondition to read form the state just after that action, then this will lead tests to fail--in ways which developers find very difficult to understand. Parallel testing really needs pure boolean postconditions--any interaction with the system under test is likely to be broken, and even monitoring information will differ for each interleaving, making counterExample in particular almost useless (which failed interleaving should you be given counterexample information about, when they all failed?). To make this work I think one needs a separate RunModel' class whose postconditions have a simpler type, which could be used for both sequential and parallel testing, while the existing RunModel class can be used only for sequential testing. (If it's used for parallel testing, then a trap is laid for developers who talk to the SUT in postconditions, and it would be nice to exclude that possibility using types). Working with such a class does require modelling more of the SUT... in order to write the postconditions, one needs to model the information in the SUT that was previously read from it, so that it is available in the postcondition when it is needed. So there's a risk users might continue to use the current RunModel class, because it simplifies modelling, but that would make parallel testing impossible. It's a bit Catch 22--what makes parallel testing so nice in e.g. eqc_statem is that it is so easy to turn it on, using the same model used for sequential testing; if one has to enrich one's model and rewrite one's postconditions, then people are much less likely to use it. |
@rjmh Do I understand correctly that in Erlang one defines postconditions in the "model" ( |
The issue with pure postconditions is that you either end up:
I don't remember the specific case that forced us to make postconditions monadic, but it was the least bad solution to avoiding the issues above at the time. |
In the Java world I used to worked in, there was a cottage industry of "assertion libraries" (eg. hamcrest) which, when they came out, improved the legibility of our unit tests. Some Haskell test frameworks (eg. hspec) have specific functions for testing expectations. I may be wrong but it seems something like the following would work:
where (roughly)
would define a pure language of assertions. Then the framework would be responsible for checking the postconditions from actual results. |
@abailly-iohk It seems you are almost talking about https://hackage.haskell.org/package/quickcheck-state-machine-0.9.0/docs/Test-StateMachine-Logic.html, right? |
I wasn't aware of this module, seems like there's more good things to port from q-s-m than just the parallel testing stuff :) |
Max wrote:
I think the issue is that when you want to write postconditions that relate the result of an action to the system state, you can either make your model detailed enough to predict that state (that's Max's annoying book-keeping), or read it directly from the system state--which you can either do in a monadic postcondition, or via a special action (Max's ancillary actions), or via reading them in the same call of perform, which means returning extra information in the action result type. And the general problem with reading from the system state, however you do it, is that those reads are hard to do atomically with the action they apply to. In parallel testing, the postconditions are checked after the entire test runs, so they simply read the wrong system state, while if the reads are done in a custom action or in perform, they will not be done atomically with the underlying action, and thus risk reading the wrong system state. One might perhaps wrap a lock around the underlying action and the associated reads inside perform to make everything atomic, but that defeats the purpose of parallel testing, which is to test that the action behaves atomically without additional synchronization. So basically, parallel testing is incompatible with reading extra information from the system state to write postconditions, however that is done. The only way to make parallel testing work is to enrich the model to predict (enough of) that information. Sometimes this will mean complicating the model considerably. And for sequential testing, it's not necessary. There is a lot to be said for making the shift from sequential to parallel testing easy, though--partly, it makes it much more likely that people will do it, than if they have to rewrite all their postconditions with a different type. Secondly, it allows the model to be debugged using sequential tests, which is far simpler and quicker than debugging using parallel tests. Here's an idea: one might allow monadic postconditions, but interpret any postcondition that actually invokes an underlying monadic operation as 'True', in a parallel test. That is, one would automatically remove those checks during parallel testing. One would want a compositional way of doing this, so that a postcondition might mix purely-functional checks (which can still fail) with monadic checks (which are just assumed to pass). It would be desirable to generate some kind of warning when this happens (e.g. X% of postconditions could not be checked). But it would allow a sequential model to be used immediately for parallel testing, which might already reveal many bugs even with the weaker postconditions, while supporting a gradual enrichment of the model (to reduce the "postcondition could not be checked" percentage). What doesn't work well is to allow monadic postconditions and just check them at the wrong time... that leads to very hard to debug false positives, and there's no better way to put people off parallel testing. |
@rjmh Thanks for the illuminating comment. Reading this
seems to me advocating for a dedicated language to express conditions, where one of the constructors would allow monadic actions to occur, e.g much like the
|
Not exactly. I think there's a lot of value in allowing arbitrary boolean expressions in postconditions... otherwise there will be a constant pressure to make the postcondition language richer. But I can imagine
where the Applicative instance would delay PostMonadic until after all the boolean checks and monitors. One could invoke the PostMonadic constructor via some offputting name, such as 'inSequentialTests', thus making it more obvious to the developer that there is a cost in using such things. It's worth thinking about how to do the monitoring too. In failing parallel tests, after shrinking, the parallel parts are usually quite small and so there may be rather few interleavings. It may also be that the interleavings that fail last are the only interesting ones--ones that fail earlier can be considered as failing "because the interleaver guessed wrong". So potentially one might try and report each maximal failing interleaving separately, with counterexample information for each one. Collecting statistics is a bit more difficult, since there may be many successful interleavings and we have no reason to prefer statistics from one over those from another. It feels wrong to collect statistics from all successful interleavings, because that would give some tests exponentially more weight than others in the overall statistics. Also it would be expensive... when many interleavings succeed, then there may be a big saving in only checking the first one. |
I do agree there's value in allowing arbitrary boolean expressions in postconditions, but I also do think there's value in providing some kind of DSL for the common cases as this alleviates the need to for the user to write both the assertion/predicate and the |
Right, like (===) in the Property type. But that doesn't require a special constructor in the Postcondition type, just a library of functions for expressing checks-with-error-messages. For example, there is no special representation for (===) properties, but you get the behaviour you want anyway. |
I agree this could be done as functions. With a dedicated language, you get some more possibilities like more compact representations for combinations. That's perhaps overkill 🤷 |
To summarise the above, there seems to be a couple of issues with the current definition of
Having the types above would make much easier to define the infrastructure for parallel testing, plus unifying the interfaces. Did I get the resulting implications of the discussion above right? I would also be interested in seeing cases where:
|
In the Peras codebase we wrote some This could possibly be expressed differently, for example by adding some "observation" action, perhaps.
In Hydra we don't use There's been a discussion about this particular check and how it could have been written using a |
I think for the first one, the "previously seen chains" should be part of the model. Morally I just think that the
And we are doing this all the time just by having an Env and a LookUp. |
@jasagredo exactly how do you propose maintaining the phase separation between the generation time and the runtime if you can't have symbolic variables that are resolved at runtime? If you don't propose getting rid of the phase distinction then there is fundamentally no difference between having If you do propose getting rid of the phase distinction that's a non-starter. |
The only difference between eqc and qcd is that in erlang you can easily generically traverse the actions and translate the variables under the hood. In Haskell implementing that is almost impossible to do nicely so you have to make the user do it themselves - hence the lookup. |
I am just making explicit that having a LookUp and an Env is probably fine. I'm saying that "morally" that could be part of the model, in the sense that having an |
I've always liked that |
Implements
runParallelActions
which will spawn multiple threads to test parallel execution in a style similar toeqc_statem
.