-
Notifications
You must be signed in to change notification settings - Fork 236
Robust, replayable proofs using unsat cores, (aka, hints, or how to replay verification in milliseconds instead of minutes)
In order to speed up verification replay times, F* is capable of recording Z3 unsat-cores. Unsat-cores contain a minimal set of facts that allow Z3 to prove unsat. By pruning the context and just keeping the unsat-core, verification can be made much faster, because Z3 doesn't see irrelevant facts.
However, although Z3 guarantees that an unsat core that it reports is logically unsatisfiable, it does not guarantee that it will be able to prove that an unsat core is indeed unsatisfiable, since such a proof may depend on heuristics triggered by facts that are not strictly part of the unsat core.
The mechanism works as follows:
- if the file type-checks, then
fstar --record_hints foo.fst
writes intofoo.fst.hints
an unsat-core for every proof obligation that was successfully verified; -
fstar --use_hints foo.fst
first tries to use the hints (if any); if the hint does not result in a successful verification, then the verification condition is fed again to Z3, without any hints.
The two flags can be combined. A third flag, --hint_info
, will tell you if the hints were successful or not. If the hints file is too stale, then it is time to regenerate it.
A fourth related flag --detail_hint_replay
can help you diagnose unsat cores that Z3 fails to prove unsatisfiable.
I use, in my ~/.emacs
:
(setq fstar-subp-prover-args (lambda () `(
"--use_hints" "--record_hints"
)))
The other technique is to have your Makefile generate the F* flags for you and use that to fill fstar-subp-prover-args
, see https://github.com/mitls/mitls-fstar/#configuring-emacs-mode for examples.
However, as of 20170718, it looks like F* can only read hints in interactive mode, not write them. So, remember to refresh hints from the command-line once in a while.
Usually, make OTHERFLAGS=--record_hints
is a good way to regenerate hints in the current directory. To regenerate all hints (it's good practice): cd src && make utest OTHERFLAGS=--record_hints
Having successful hint replay is a prerequisite to ensure that your file will continue to verify in the face of changes to F* and Z3. To check the health of a given file, make sure you record hints, then run F* again, passing it --use_hints --hint_info
. If you get a message along the lines of:
(C:/Build/Agent1/_work/3/s/code/experimental/aesgcm/Crypto.Symmetric.GF128.fst(170,0-183,5)) Query-stats (Crypto.Symmetric.GF128.gf128_mul_loop, 3) failed {reason-unknown=unknown because (incomplete quantifiers)} (with hint) in 171 milliseconds with fuel 1 and ifuel 0 and rlimit 21786240
(Hint-replay failed): C:/Build/Agent1/_work/3/s/code/experimental/aesgcm/Crypto.Symmetric.GF128.fst(168,43-168,63): (Info) could not prove post-condition
then your proof is not robust and may start failing as F* and Z3 evolve.
The first line of the message tells you that for the top-level term at Crypto.Symmetric.GF128.fst(170,0-183,5)
, specifically Crypto.Symmetric.GF128.gf128_mul_loop
, the third proof obligation (that's the 3
), F* tried to replay a previously discovered unsat core (that's the with hint
) but failed because the SMT solver said unknown
giving the reason incomplete quantifiers
. This is a very common reason: for any problem with quantifiers (i.e., all F* problems) the SMT solver is inherently incomplete. Occasionally, the SMT solver may also say things like incomplete (theory arithmetic)
, if the proof requires non-linear arithmetic etc.
The next line of the message is not guaranteed to be present, but it should usually be there. Among all the properties that needed to be proved about Crypto.Symmetric.GF128.gf128_mul_loop
, in this case only one property caused the unsat core to be unreplayable. In particular, the proof of the post-condition specified at Crypto.Symmetric.GF128.fst(168,43-168,63)
is the culprit that's making your entire proof brittle.
At this point, you can provide a more detailed proof in F* for just those brittle parts, helping the SMT solver find a better unsat core from which it can easily rediscover a proof of unsatisfiability. Or, if you're an expert and comfortable working at the level of the smt2 files emitted by F* to Z3, you can try to understand what it is about those particular goals that are causing Z3 to find a brittle, unreplayable proof and fix the root cause of brittleness once and for all. This is not always easy. It requires a manual analysis of the libraries and their lemmas that your code is relying on to figure out which lemmas are stated in a brittle manner. An example of this is at the end of this page.
Occasionally, F* will not be able to report the the second Hint-replay failed
diagnostic. This may happen when the SMT solver simply times out without providing F* with enough diagnostics to reconstruct an error message.
When this happens (hopefully not too often), there are two ways to proceed:
- Run F* again, but this time with the option
--detail_hint_replay
Here is a sample run of this feature.
$ fstar testhintreplay.fst --use_hints --detail_hint_replay
Detailed hint replay report follows for .\testhintreplay.fst(2,0-5,41)
Taking 5 seconds per proof obligation (3 proofs in total)
3, 2, 1,
OK: proof obligation at .\testhintreplay.fst(3,9-3,21) was proven
.\testhintreplay.fst(4,2-4,8): (Warning) Hint failed to replay this sub-proof: assertion failed (see also .\testhintreplay.fst(4,9-4,37))
.\testhintreplay.fst(5,2-5,8): (Warning) Hint failed to replay this sub-proof: assertion failed (see also .\testhintreplay.fst(5,9-5,41))
Verified module: TestHintReplay (569 milliseconds)
All verification conditions discharged successfully
This tells you that there were 3 sub-proofs involved in the top-level term at .\testhintreplay.fst(2,0-5,41)
.
The first sub-proof was easily rediscovered from the unsat core, but the two others failed.
- If even
--detail_hint_replay
doesn't give you the information you need, then you can proceed manually
The --detail_hint_replay
feature is relatively new (July 19th, 2017).
Prior to this feature, finding the sub-proof that was failing was very manual. The description of this manual process does have many useful insights, in particular, what one can try to do after having isolated the sub-proof that is brittle.
The steps are, roughly:
- do a binary search on the post-condition to figure out which sub-term of the post-condition cannot be replayed, e.g. if you find that turning
(ensures (a /\ b))
into(ensures a)
makes the hint replayable, then you need to provide a better proof forb
- once
b
has been identified, call extra lemmas to provide intermediary proof steps.
Generally speaking, a hint that fails to replay indicates some poorly-understood triggering behavior. At this stage, Profiling Z3 queries is what you need.
Analyzing one file manually is fine, but analyzing the build logs for an entire project is better done with a tool. One can use query-stats.py
, located in the .scripts
directory of F*. Some sample invocations:
# List all hints that failed to replay
$FSTAR_HOME/.scripts/query-stats.py -f my_log_file \
-n all --filter=fstar_usedhints=+ \
--filter=fstar_tag=-
# List the 10 queries that took the longest time to verify along with some statistics
$FSTAR_HOME/.scripts/query-stats.py -f my_log_file \
-c -g -n 10
https://github.com/FStarLang/FStar/issues/900 may contain some useful discussion
An interesting source of brittleness was discovered in an old version of FStar.HyperStack.ST
.
It causes even simple proofs like this to produce an unreplayable unsat core. This meant that any program making non-trivial use of FStar.HyperStack.ST
's StackInline
effect would have a brittle proof.
open FStar.HyperStack
open FStar.HyperStack.ST
let test (h0:mem) (h1:mem) (h2:mem) =
assume (inline_stack_inv h0 h1);
assume (inline_stack_inv h1 h2);
assert (inline_stack_inv h0 h2)
If you're writing libraries and making use of quantifiers then it's really worth your time to understand the causes of this kind of brittleness.
Here's the reason why the transitivity proof fails to replay.
This is the definition on inline_stack_inv
:
(* The frame invariant is enforced *)
h.tip = h'.tip
(* The heap structure is unchanged *)
/\ Map.domain h.h == Map.domain h'.h
(* Any region that is not the tip has no seen any allocation *)
/\ (forall (r:HH.rid). {:pattern (Map.contains h.h r)} (r <> h.tip /\ Map.contains h.h r)
==> Heap.equal_dom (Map.sel h.h r) (Map.sel h'.h r))
The important part is the third clause. If the goal is inline_stack_inv h0 h2
then the goal is skolemized and we have an occurrence of Map.contains h0.h fresh_r
This triggers the quantifier in the context from inline_stack_inv h0 h1
producing an equal_dom (Map.sel h0.h fresh_r) (Map.sel h1.h fresh_r)
But, notice that there's no term Map.contains h1.h fresh_r
term that's produced.
So, now we're stuck: there's no way to trigger the quantifier in inline_stack_inv h1 h2
Why the proof succeeds at all the first time is a mystery. Somehow, there must be some other assertion in the context contributing a Map.contains h1.h fresh_r
probably resulting from Map.sel h1.h fresh_r
.
Whatever that assertion is, it only contributes to triggering and so is excluded from the unsat core by Z3. Lacking it, the unsat core is unreplayable.
If we redefine inline_stack_inv
like so:
(* The frame invariant is enforced *)
h.tip = h'.tip
(* The heap structure is unchanged *)
/\ Map.domain h.h == Map.domain h'.h
(* Any region that is not the tip has no seen any allocation *)
/\ (forall (r:HH.rid). {:pattern (Map.contains h.h r)} (r <> h.tip /\ Map.contains h.h r)
==> Heap.equal_dom (Map.sel h.h r) (Map.sel h'.h r) /\
Map.contains h'.h r)
then all is well (notice the Map.contains h'.h r
on the last line.
It's easy to prove this logically equivalent to the existing inline_stack_inv
. But this one has more robust triggering behavior.
This is an example taken from SHA2_512.fst
. The following function verifies, but its hints do not replay.
[@"substitute"]
private val update_core:
hash_w :uint64_p {length hash_w = v size_hash_w} ->
data :uint8_p {length data = v size_block /\ disjoint data hash_w} ->
data_w :uint64_p {length data_w = v size_block_w /\ disjoint data_w hash_w} ->
ws_w :uint64_p {length ws_w = v size_ws_w /\ disjoint ws_w hash_w} ->
k_w :uint64_p {length k_w = v size_k_w /\ disjoint k_w hash_w} ->
Stack unit
(requires (fun h0 -> live h0 hash_w /\ live h0 data /\ live h0 data_w /\ live h0 ws_w /\ live h0 k_w
/\ reveal_h64s (as_seq h0 k_w) == Spec.k
/\ (reveal_h64s (as_seq h0 data_w) = Spec.Lib.uint64s_from_be (v size_block_w) (reveal_sbytes (as_seq h0 data)))
/\ (let w = reveal_h64s (as_seq h0 ws_w) in
let b = reveal_h64s (as_seq h0 data_w) in
(forall (i:nat). {:pattern (Seq.index w i)} i < 80 ==> Seq.index w i == Spec.ws b i))))
(ensures (fun h0 r h1 -> live h0 hash_w /\ live h0 data /\ live h0 data_w /\ live h1 hash_w /\ modifies_1 hash_w h0 h1
/\ (let seq_hash_0 = reveal_h64s (as_seq h0 hash_w) in
let seq_hash_1 = reveal_h64s (as_seq h1 hash_w) in
let seq_block = reveal_sbytes (as_seq h0 data) in
let res = Spec.update seq_hash_0 seq_block in
seq_hash_1 == res)))
#reset-options "--z3refresh --max_fuel 0 --z3rlimit 400"
[@"substitute"]
let update_core hash_w data data_w ws_w k_w =
(**) assert_norm(pow2 32 = 0x100000000);
(**) assert_norm(pow2 64 = 0x10000000000000000);
(**) assert_norm(pow2 125 = 42535295865117307932921825928971026432);
(**) let h0 = ST.get() in
(* Push a new frame *)
(**) push_frame();
(**) let h1 = ST.get() in
(**) assert(let b = Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data)) in
reveal_h64s (as_seq h0 data_w) == b);
(* Allocate space for converting the data block *)
let hash_0 = Buffer.create (u64_to_h64 0uL) size_hash_w in
(**) let h2 = ST.get() in
(**) no_upd_lemma_0 h1 h2 data;
(**) no_upd_lemma_0 h1 h2 data_w;
(**) no_upd_lemma_0 h1 h2 ws_w;
(**) no_upd_lemma_0 h1 h2 k_w;
(**) no_upd_lemma_0 h1 h2 hash_w;
(* Keep track of the the current working hash from the state *)
copy_hash hash_0 hash_w;
(**) let h3 = ST.get() in
(**) no_upd_lemma_1 h2 h3 hash_0 data;
(**) no_upd_lemma_1 h2 h3 hash_0 data_w;
(**) no_upd_lemma_1 h2 h3 hash_0 ws_w;
(**) no_upd_lemma_1 h2 h3 hash_0 k_w;
(**) no_upd_lemma_1 h2 h3 hash_0 hash_w;
(* Step 2 : Initialize the eight working variables *)
(* Step 3 : Perform logical operations on the working variables *)
(* Step 4 : Compute the ith intermediate hash value *)
shuffle hash_0 data_w ws_w k_w;
(**) let h4 = ST.get() in
(**) assert(let b = Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data)) in
let ha = Spec.shuffle (reveal_h64s (as_seq h0 hash_w)) b in
as_seq h4 hash_w == as_seq h0 hash_w /\
reveal_h64s (as_seq h4 hash_0) == ha);
(**) no_upd_lemma_1 h3 h4 hash_0 data;
(**) no_upd_lemma_1 h3 h4 hash_0 data_w;
(**) no_upd_lemma_1 h3 h4 hash_0 ws_w;
(**) no_upd_lemma_1 h3 h4 hash_0 k_w;
(**) no_upd_lemma_1 h3 h4 hash_0 hash_w;
(* Use the previous one to update it inplace *)
sum_hash hash_w hash_0;
(**) let h5 = ST.get() in
(**) assert(let x = reveal_h64s (as_seq h4 hash_w) in
let y = reveal_h64s (as_seq h4 hash_0) in
x == reveal_h64s (as_seq h0 hash_w) /\
y == Spec.shuffle (reveal_h64s (as_seq h0 hash_w)) (Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data))));
(**) assert(let x = reveal_h64s (as_seq h0 hash_w) in
let y = Spec.shuffle (reveal_h64s (as_seq h0 hash_w)) (Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data))) in
let z = reveal_h64s (as_seq h5 hash_w) in
let z' = Spec.Loops.seq_map2 (fun x y -> FStar.UInt64.(x +%^ y)) x y in
z == z');
(**) no_upd_lemma_1 h4 h5 hash_w data;
(**) no_upd_lemma_1 h4 h5 hash_w data_w;
(**) no_upd_lemma_1 h4 h5 hash_w ws_w;
(**) no_upd_lemma_1 h4 h5 hash_w k_w;
(* Pop the frame *)
(**) pop_frame()
To figure out that this function's hints do not replay, I ran the following twice (once to record, once to replay):
jonathan@chartreuse:~/Code/hacl-star/code/hash (migrating_sts_bis) $ make Hacl.Hash.SHA2_512.fst-verify OTHERFLAGS="--hint_info --record_hints" 2>&1 | tee log
The second time, I got:
(./Hacl.Hash.SHA2_512.fst(587,0-650,18)) Query-stats (Hacl.Hash.SHA2_512.update_core, 1) failed (with hint) in 216 milliseconds with fuel 0 and ifuel 1 and rlimit 217862400
Failed hint:
...
There is a lot of reasoning in this file, to show that the stateful implementation matches the pure specification. These proofs are typically expensive, requiring a lot of Z3 reasoning. Making sure these proofs are replayable is thus important.
The next step is figuring out which part of the post-condition is responsible for non-replayability. I manually edited the file as follows, making everything lax-checked except for the body of the function I was interested in.
#set-options "--lax"
... many definitions ...
val update_core: ...
#reset-options
let update_core = ...
#set-options "--lax"
Running my F* invocation twice again confirmed that the hint was still failing to replay: the culprit was not in the val
but, really, in the let
.
I started bisecting: I commented out various bits of the post-condition, running my F* invocation twice every time, until I found that the modifies_1
clause was the culprit: just leaving this in the post-condition generates a non-replayable hint.
There are a lot of modification-related lemmas that are being called, but none of them seem to help with the replayability: it is thus time to do a manual proof.
I commented out every single assert in the function, to ensure that Z3 is not working to prove unrelated things, and set the post-condition to True
. I switched to interactive mode: the function was now going through in a fraction of a second.
The various liveness buffers are in FStar/ulib/FStar.Buffer.fst
. I briefly skimmed the file to refresh my memory, then wrote out intermediary asserts.
(**) let h0 = ST.get() in
(* Push a new frame *)
(**) push_frame();
(**) let h1 = ST.get() in
(* Allocate space for converting the data block *)
let hash_0 = Buffer.create (u64_to_h64 0uL) size_hash_w in
(**) let h2 = ST.get() in
assert (modifies_0 h1 h2);
(* Keep track of the the current working hash from the state *)
copy_hash hash_0 hash_w;
(**) let h3 = ST.get() in
assert (modifies_1 hash_0 h2 h3);
shuffle hash_0 data_w ws_w k_w;
(**) let h4 = ST.get() in
assert (modifies_1 hash_0 h3 h4);
(* Use the previous one to update it inplace *)
sum_hash hash_w hash_0;
(**) let h5 = ST.get() in
assert (modifies_1 hash_w h4 h5);
(* Pop the frame *)
(**) pop_frame();
let h6 = ST.get () in
()
These asserts are easy: I found them by looking at the post-conditions of copy_hash
, shuffle
and sum_hash
. I now have intermediary asserts that relate h1
to h2
; h2
to h3
; h3
to h4
; h4
to h5
. Time to use some transitivity lemmas.
- Going from
h2
toh3
modifieshash_0
; going fromh3
toh4
modifieshash_0
. I can use transitivity of themodifies_1
clause: callinglemma_modifies_1_trans hash_0 h2 h3 h4
allowsassert (modifies_1 hash_0 h2 h4)
to succeed and now relatesh2
toh4
. - Relating
h2
toh5
requires a differentmodifies
clause, because going fromh2
toh4
modifieshash_0
while going fromh4
toh5
modifieshash_w
. I can combine twomodifies_1
into amodifies_2
:
lemma_modifies_1_1 hash_0 hash_w h2 h4 h5;
assert (modifies_2 hash_0 hash_w h2 h5);
We have now related h2
to h5
.
Interestingly, h1
and h5
mark the beginning and the end of the function's frame. Note that hash_0
was allocated in that temporary frame, and dies after the pop_frame
. Thus, going from h0
to h6
, among all addresses in h0
, hash_w
is the only buffer that's modified. In other words: modifies_1 hash_w h0 h6
.
The lemma that we need, in order to work with the popped frame and conclude modifies_1 hash_w h0 h6
is:
let modifies_popped_1 (#t:Type) (a:buffer t) h0 h1 h2 h3 : Lemma
(requires (live h0 a /\ fresh_frame h0 h1 /\ popped h2 h3 /\ modifies_2_1 a h1 h2))
(ensures (modifies_1 a h0 h3))
This lemma uses a new modifies
clause: while modifies_1 b
means that, among all buffers in all frames, only b
was modified, modifies_2_1 b
means that, in b
's frame, b
is the only modified buffer; all other frames remain unchanged, except for possibly the tip
.
This modifies_2_1
clause is just what we need: if the tip
was modified in an unspecified manner, and if, among all other frames, b
is the only modified buffer, then once we pop
the tip
, modifies_1 b
holds.
lemma_modifies_0_2' hash_w hash_0 h1 h2 h5;
assert (modifies_2_1 hash_w h1 h5);
We can kill two birds with one stone and stitch h1
and h5
together, obtaining a modifies_2_1
clause.
The final lemma call is:
modifies_popped_1 hash_w h0 h1 h5 h6
At this stage, I have a proof of modifies_1
that goes through, in the interactive mode, in a fraction of a second.
(**) let h0 = ST.get() in
(* Push a new frame *)
(**) push_frame();
(**) let h1 = ST.get() in
(* Allocate space for converting the data block *)
let hash_0 = Buffer.create (u64_to_h64 0uL) size_hash_w in
(**) let h2 = ST.get() in
assert (modifies_0 h1 h2);
(* Keep track of the the current working hash from the state *)
copy_hash hash_0 hash_w;
(**) let h3 = ST.get() in
assert (modifies_1 hash_0 h2 h3);
shuffle hash_0 data_w ws_w k_w;
(**) let h4 = ST.get() in
assert (modifies_1 hash_0 h3 h4);
lemma_modifies_1_trans hash_0 h2 h3 h4;
assert (modifies_1 hash_0 h2 h4);
(* Use the previous one to update it inplace *)
sum_hash hash_w hash_0;
(**) let h5 = ST.get() in
assert (modifies_1 hash_w h4 h5);
lemma_modifies_1_1 hash_0 hash_w h2 h4 h5;
assert (modifies_2 hash_0 hash_w h2 h5);
lemma_modifies_0_2' hash_w hash_0 h1 h2 h5;
assert (modifies_2_1 hash_w h1 h5);
(* Pop the frame *)
(**) pop_frame();
let h6 = ST.get () in
modifies_popped_1 hash_w h0 h1 h5 h6
Since modifies
implies liveness, none of the original no_upd_*
lemmas are necessary.
I un-comment the modifies_1
post-condition; re-running my F* invocation twice, I can see that the hints are now replayable. Un-commenting the rest of the post-condition and restoring the original asserts and lemmas for the functional correctness of the function, the hints are still replayable.