Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve assertion failure wording #3216

Open
Timmmm opened this issue Dec 18, 2022 · 5 comments · May be fixed by #3324
Open

Improve assertion failure wording #3216

Timmmm opened this issue Dec 18, 2022 · 5 comments · May be fixed by #3324
Labels
kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label

Comments

@Timmmm
Copy link

Timmmm commented Dec 18, 2022

Dafny version

3.10.0

Code to produce this issue

assert 1 == 2;

Command to run and resulting output

Open in VSCode.

What happened?

I've done some formal verification of SystemVerilog using SVA and Synopsys's VC_Static tool. With that tool you write properties, click "Prove" and then wait. The possible states of a property are:

  • Proven
  • Vacuously proven (the whole property is always true but only because the antecedent of an implication can never happen)
  • Falsified
  • Inconclusive (not enough time was allowed to prove or falsify it).

A counter-example is only available in the Falsified case.

This all makes intuitive sense to me.

Dafny doesn't seem to do it in the same way. Proven is the same, but all other three states result in the error assertion might not hold which by any reasonable interpretation means it has been falsified.

Even worse, you can press F7 to generate a counter-example even for the inconclusive case where the example actually satisfies the assertion!

Could Dafny not distinguish Falsified and Inconclusive? If not I think the error should at least be reworded to could not prove assertion which I think is what it is actually trying to say. (Similarly for postconditions etc.)

For the counter-examples it should be trivial to determine if the generated counter-example passes or fails the assertion and give a message like this example disproves the assertion or this example passes the assertion but Dafny cannot reason about it (or whatever a passing counter-example is supposed to mean).

Hope this isn't too negative. My overall experience of Dafny so far has been very positive, mainly because of the excellent VSCode extension and amazing documentation.

What type of operating system are you experiencing the problem on?

Windows

@Timmmm Timmmm added the kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label label Dec 18, 2022
@MikaelMayer
Copy link
Member

Thanks for using Dafny and for this positive feedback !

Could Dafny not distinguish Falsified and Inconclusive?

The challenge of modeling programs using SMT solvers is that, when using quantifiers, z3 comes up with a counter-example it cannot disprove is not correct. That's where it says "unknown" and returns a counter-example, and Dafny says "assertion might not hold". Everything that is not proven in Dafny is indeed reported indistinguishably as an error.
A few remarks.

  • For real timeouts, you'll get the message that the verifier times out
  • You don't always want a warning for vacuously proven assertions, .e.g. if you are doing a proof by contradiction, you arrive at assert false, and then you can assert the hypothesis, it's natural.
  • You can get a sense of what assertions are conditionally proven by looking at failed assertions and removing them. Their green check mark will turn red.

For the counter-examples it should be trivial to determine if the generated counter-example passes or fails the assertion and give a message like this example disproves the assertion or this example passes the assertion but Dafny cannot reason about it (or whatever a passing counter-example is supposed to mean).

It's not possible in general to "run" counter-examples, because it can involve ghost code that is either not compiled, or cannot be compiled at all (e.g. when using frames to model the heap, we can't take a screenshot of the heap). Also, Dafny models functions with two unrollings maximum when proving them, which means that the third recursive call is uninterpreted, so it's not always helpful to have a counter-example that says that a function's value on the third recursive call is not what would be computed in practice. But otherwise, I like the idea of being able to test if counter-examples make sense, either by running them or verifying them independently if possible.

at least be reworded to could not prove assertion which I think is what it is actually trying to say. (Similarly for postconditions etc.)

I agree that "could not prove assertion" seems better, like postconditions, I had been thinking about it also. Previously, the message stated "assertion violation" which had proven to be awfully confusing for some users.

@Timmmm
Copy link
Author

Timmmm commented Dec 21, 2022

Ok I don't follow fully (total noob here) but if I'm understanding you directly the issue is when you have something like:

lemma Foo(input: int)
   ensures forall i :: 0 <= i < 10000000000000000 ==> some_property(input, i)
{}

Then it can fail to prove it, and give a counter-example of say input = 1000, but it can't in general evaluate the assertions because the forall makes things exponential or possibly even unbounded.

I had a look at the code, and it seems like this bit is where the prover results are translated to Dafny's messages:

      ProverInterface.Outcome.Valid => GutterVerificationStatus.Verified,
      ProverInterface.Outcome.Invalid => GutterVerificationStatus.Error,
      ProverInterface.Outcome.Undetermined => GutterVerificationStatus.Inconclusive,
      ProverInterface.Outcome.TimeOut => GutterVerificationStatus.Error,
      ProverInterface.Outcome.OutOfMemory => GutterVerificationStatus.Error,
      ProverInterface.Outcome.OutOfResource => GutterVerificationStatus.Error,
      ProverInterface.Outcome.Bounded => GutterVerificationStatus.Error,

And then here it converts it to a message:

          case GutterVerificationStatus.Error:
            var failureDescription = description?.FailureDescription ?? "_no message_";
            if (currentlyHoveringPostcondition &&
                  (failureDescription == new PostconditionDescription().FailureDescription ||
                   failureDescription == new EnsuresDescription().FailureDescription)) {
              failureDescription = "This postcondition might not hold on a return path.";
            }
            return $"{obsolescence}[**Error:**](https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-verification-debugging) " +
                   failureDescription;

Where FailureDescription is e.g.:

  public override string FailureDescription =>
    customErrMsg ?? "assertion might not hold";

ProverInterface.Outcome is from Boogie.

Are you saying that none of the ProverInterface.Outcome values actually mean "this assertion is definitely false"? Even ProverInterface.Outcome.Invalid?

In which case what is the practical difference between Invalid and Undetermined?

@MikaelMayer
Copy link
Member

Are you saying that none of the ProverInterface.Outcome values actually mean "this assertion is definitely false"? Even ProverInterface.Outcome.Invalid?

When Boogie says invalid, it means that it has a counter-example for the model Dafny sent it. Dafny provides the guarantee that if Boogie says valid, then the Dafny code is valid. I'm not an expert in this domain, @RustanLeino could answer better,
But the way Dafny encodes verifications conditions is weaker in Boogie than in Dafny. For example Boogie's counter example includes models for functions that might have a definite implementation in Dafny, it's just that this model does not match the function itself - which was opaque after 2 levels of unrolling.

Once we will get a better debugging experience in VSCode, users would be able to browse the actual counter-example, see where there is a mismatch with Dafny, and potentially give more fuel or write things differently
But for now, Invalid and Undetermined are indeed very similar for Dafny, as the counter-examples are not necessarily precise.

@Timmmm
Copy link
Author

Timmmm commented Dec 21, 2022

Ah ok I think I understand. In that case can I suggest:

  1. Remove GutterVerificationStatus.Inconclusive
  2. Reword all the GutterVerificationStatus.Error messages to "could not prove ..."
  3. Change some references to "counter-example" to "possible counter-example", e.g. in the command palette.

Would you accept a PR for that?

@MikaelMayer
Copy link
Member

  1. I still think that down the road we could make a better use of GutterVerificationStatus.Inconclusive for rendering. Right now, inclusive is rendered as nothing, so we don't see assertions like when they are conditionally verified in the gutter, but this ought to change to my opinion.
  2. I like "could not prove" better indeed. That's what your issue is about.
  3. Already done recently: We just changed in the next version of VSCode the wording "counter-example" to "counter-example (experimental)" to indicate a hint that it's not fully functional yet.

MikaelMayer added a commit that referenced this issue Jan 4, 2023
This PR fixes #3216
It completely replaces the wording ".... might not hold" by "could not prove..."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants