-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strlen #468
Comments
Ah, this is an interesting one. The code for However, we haven't correctly piped the relevant assumptions into the path condition, so the simulator believes that this load must succeed regardless and emits an impossible proof obligation. As a workaround, you can insert a concrete 0 at the end of the array instead of assuming the final byte is 0; I believe that should work. Meantime, I'll see about getting this bug fixed. |
Here is a
|
Changing the saw-script to this still gives the same error.
|
@robdockins ...something sinister is happening here... |
Prepending zeros -- still broken
All zeroes -- works!
|
Yeah, after little more thinking about it I know why that happens; it's quite annoying. Basically, the information about the final array value being concrete doesn't get piped down to the symbolic simulator. Only fully concrete values get piped into the simulator, everything else is abstracted. It's something that's difficult to improve, but something I've wanted for some time. |
Previously, `strlen` would recurse down a string until it ran into a concrete zero. This is fine, except that it would assert that each load along the way would succeed, without taking into account the previous zero tests that had to fail. To correct this we need to emulate a path condition such that to load at position `n` each of the `n-1` previous locations must have been non-zero, and the loads are only required to succeed under that condition. This allows one, for example, to assume that some value in a string is 0 (but not specify which one). Previously, `strlen` would index off the end of the allocation of such a string and fail. Now it will succeed because the path condition required to index out-of-bounds is inconsistent. Fixes #468
I figured that, at the very least, using 'sat branches' is supposed to bridge this gap --- that is, the SAT solver is supposed to kick off during symbolic execution at conditional branches, taking preconditions into account to alleviate these exact kind of false-positive error conditions and symbolic termination conditions. Maybe the crucible 'strlen' built-in isn't talking to the SAT solver? |
Yes, the built-in |
Previously, `strlen` would recurse down a string until it ran into a concrete zero. This is fine, except that it would assert that each load along the way would succeed, without taking into account the previous zero tests that had to fail. To correct this we need to emulate a path condition such that to load at position `n` each of the `n-1` previous locations must have been non-zero, and the loads are only required to succeed under that condition. This allows one, for example, to assume that some value in a string is 0 (but not specify which one). Previously, `strlen` would index off the end of the allocation of such a string and fail. Now it will succeed because the path condition required to index out-of-bounds is inconsistent. Fixes #468
Something happened that caused
versus
|
The crucible unit test for strlen is still working. However, I bisected SAW and it looks like the first commit with the regression is d679eabe89939d5f61e555ec3129d06181c15081. This is a submodule update commit, so I'll have to dig in and see what happened. |
A small example is attached. Works with 5/13, doesn't with 5/19. |
It looks like 5/13 realizes when the string is accessed out of bounds and stops the symbolic execution. 5/19 is oblivious. 5/13 is also way faster at symbolic execution (but maybe that's another ticket). |
Further bisection reveals that the following What4 patch is the proximate cause of this regression: cce137e0247ef9ee970ae13d280dea1ec9449413 That patch changes the way the bitvector abstract domains are implemented. The fact that this causes a regression in this strlen case is deeply mysterious to me, especially given that the corresponding I conclude that there is some nontrivial interaction between the SAW glue code, the What4 bitvector abstract domains, and the LLVM memory model that gives rise to this behavior. |
The minimized |
This issue here turns out to be a really interesting combination of effects. First, the only reason this program terminated quickly before was that we are able to learn abstract domain interval information about the computed value of Now, with the change to the bitvector domain implementation, I made another arbitrary choice that I didn't think would matter: I started singleton values in the "bitwise" abstract domain mode. In retrospect, this is probably the wrong choice. When you union together a contiguous sequence of singleton values in the arithmetic mode, you get a nice dense interval; however, if you union together such a sequence in the bitwise mode, you get a quite less accurate approximation, especially if your interval crosses 0 (then it goes all the way to top). Thus: instead of a nice, small interval we forget all abstract information about the result of strlen. As a result, we cannot abort or exit the loop early. So, there are several things to fix here, any of which makes this issue solved. First, the sentinel value for |
This should be fixed via #485 |
I'm really not sure what is wrong here. Is something wrong with
saw
's notion ofstrlen
? Or, am I doing something wrong? Any help would be appreciated.strlen_test.c
strlen_test.saw
The text was updated successfully, but these errors were encountered: