-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using bitfields in SAW specifications #1461
Comments
Ultimately, this is for the benefit of the SAW proofs. Due to a limitation in how SAW currently works, bitfields must be accessed by index rather than by name, and due to how often new fields are added to `s2n_connection`, the only way to do this in way that's maintainable is to have all the bitfield fields be up front. That way, the index to access the bitfield will always be zero, which significantly decreases the likelihood that the SAW proofs will need to be updated with each new field added to `s2n_connection`. This is all rather unfortunate. See GaloisInc/saw-script#1461 for a plan to make handling bitfields more maintainable in SAW.
A related issue is that SAW is overzealous about initializing bitfields in #include <stdint.h>
struct s {
int32_t w;
uint8_t x:1;
uint8_t y:1;
int32_t z;
};
uint8_t f(struct s *ss) {
return ss->x + ss->y;
} Then this specification for
This is because the preconditions initialize We should definitely add a validity check to ensure that fields in bitfield aren't used in |
Ultimately, this is for the benefit of the SAW proofs. Due to a limitation in how SAW currently works, bitfields must be accessed by index rather than by name, and due to how often new fields are added to `s2n_connection`, the only way to do this in way that's maintainable is to have all the bitfield fields be up front. That way, the index to access the bitfield will always be zero, which significantly decreases the likelihood that the SAW proofs will need to be updated with each new field added to `s2n_connection`. This is all rather unfortunate. See GaloisInc/saw-script#1461 for a plan to make handling bitfields more maintainable in SAW.
Ultimately, this is for the benefit of the SAW proofs. Due to a limitation in how SAW currently works, bitfields must be accessed by index rather than by name, and due to how often new fields are added to `s2n_connection`, the only way to do this in way that's maintainable is to have all the bitfield fields be up front. That way, the index to access the bitfield will always be zero, which significantly decreases the likelihood that the SAW proofs will need to be updated with each new field added to `s2n_connection`. This is all rather unfortunate. See GaloisInc/saw-script#1461 for a plan to make handling bitfields more maintainable in SAW.
I think the easiest and most robust way to handle this is the same as I have been contemplating for crux generally. The idea is, to make allocations (by default, at least) be populated by fresh uninterpreted bytes at allocation time. Once that happens, I think most of these tricky questions about bitfields get resolved. |
I partially retract my claim that we should make the program in #1461 (comment) throw an error. The only way that SAW can know if a struct has bitfields or not is by inspecting LLVM debug information, and since it's not guaranteed that every program will have debug information, we couldn't reliably reject programs like that one. At best, we could make it a warning that only fires if debug information is present. |
That's option (1) from GaloisInc/crucible#844 (comment), right?
Which tricky issues in particular are you referring to? I still think we'd need to track bitfield-specific information in SAW in order to do this correctly, but perhaps I'm missing something. |
LLVM bitcode doesn't directly record information about bitfields, but its debug information _does_ record this information. Knowing about bitfields is important for certain applications—see, for example, GaloisInc/saw-script#1461. This changes `Text.LLVM.DebugUtils` such that if any of the fields in a struct have bitfields, it will record this information in the new `BitfieldInfo` data type. This requires a backwards-incompatible change to the type of the `Structure` data constructor. In case we need to add additional fields to `Structure` in the future, I converted `Structure`'s fields into a record data type, which makes it slightly easier to extend. I also did the same thing to `Union` for consistency (although this is not strictly necessary).
I've opened a PR with the necessary |
LLVM bitcode doesn't directly record information about bitfields, but its debug information _does_ record this information. Knowing about bitfields is important for certain applications—see, for example, GaloisInc/saw-script#1461. This changes `Text.LLVM.DebugUtils` such that if any of the fields in a struct have bitfields, it will record this information in the new `BitfieldInfo` data type. This requires a backwards-incompatible change to the type of the `Structure` data constructor. In case we need to add additional fields to `Structure` in the future, I converted `Structure`'s fields into a record data type, which makes it slightly easier to extend. I also did the same thing to `Union` for consistency (although this is not strictly necessary).
This adds support for writing specifications that talk about bitfields in LLVM code by way of the new `llvm_points_to_bitfield` command. Broadly speaking, `llvm_points_to_bitfield ptr fieldName rhs` is like `llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is required to be the name of a field within a bitfield. The salient details are: * LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's debug metadata does. Support for retrieving bitfield-related metadata was added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the `llvm-pretty` submodule to incorporate it. This patch also updates the `crucible` submodule to incorporate corresponding changes in GaloisInc/crucible#936. * The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data constructor that stores all of the necessary information related to the `llvm_points_to_bitfield` command. As a result, the changes in this patch are fairly insulated from the rest of SAW, as most of the new code involves adding additional cases to handle `LLVMPointsToBitfield`. * Two of the key new functions are `storePointsToBitfieldValue` and `matchPointsToBitfieldValue`, which implement the behavior of `llvm_points_to_bitfield` in pre- and post-conditions. These functions implement the necessary bit-twiddling to store values in and retrieve values out of bitfield. I have left extensive comments in each function describing how all of this works. * Accompanying `llvm_points_to_bitfield` is a new set of `{enable,disable}_lax_loads_and_stores` command, which toggles the Crucible-side option of the same name. When `enable_lax_loads_and_stores` is on, reading from uninitialized memory will return a symbolic value rather than failing outright. This is essential to be able to deal with LLVM bitcode involving bitfields, as reading a field from a bitfield involves reading the entire bitfield at once, which may include parts of the struct that have not been initialized yet. * There are various `test_bitfield_*` test cases under `intTests` to test examples of bitfield-related specifications that should and should not verify. * I have also updated `saw-remote-api` and `saw-client` to handle bitfields as well, along with a Python-specific test case. Fixes #1461.
This adds support for writing specifications that talk about bitfields in LLVM code by way of the new `llvm_points_to_bitfield` command. Broadly speaking, `llvm_points_to_bitfield ptr fieldName rhs` is like `llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is required to be the name of a field within a bitfield. The salient details are: * LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's debug metadata does. Support for retrieving bitfield-related metadata was added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the `llvm-pretty` submodule to incorporate it. This patch also updates the `crucible` submodule to incorporate corresponding changes in GaloisInc/crucible#936. * The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data constructor that stores all of the necessary information related to the `llvm_points_to_bitfield` command. As a result, the changes in this patch are fairly insulated from the rest of SAW, as most of the new code involves adding additional cases to handle `LLVMPointsToBitfield`. * Two of the key new functions are `storePointsToBitfieldValue` and `matchPointsToBitfieldValue`, which implement the behavior of `llvm_points_to_bitfield` in pre- and post-conditions. These functions implement the necessary bit-twiddling to store values in and retrieve values out of bitfield. I have left extensive comments in each function describing how all of this works. * Accompanying `llvm_points_to_bitfield` is a new set of `{enable,disable}_lax_loads_and_stores` command, which toggles the Crucible-side option of the same name. When `enable_lax_loads_and_stores` is on, reading from uninitialized memory will return a symbolic value rather than failing outright. This is essential to be able to deal with LLVM bitcode involving bitfields, as reading a field from a bitfield involves reading the entire bitfield at once, which may include parts of the struct that have not been initialized yet. * There are various `test_bitfield_*` test cases under `intTests` to test examples of bitfield-related specifications that should and should not verify. * I have also updated `saw-remote-api` and `saw-client` to handle bitfields as well, along with a Python-specific test case. Fixes #1461.
This adds support for writing specifications that talk about bitfields in LLVM code by way of the new `llvm_points_to_bitfield` command. Broadly speaking, `llvm_points_to_bitfield ptr fieldName rhs` is like `llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is required to be the name of a field within a bitfield. The salient details are: * LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's debug metadata does. Support for retrieving bitfield-related metadata was added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the `llvm-pretty` submodule to incorporate it. This patch also updates the `crucible` submodule to incorporate corresponding changes in GaloisInc/crucible#936. * The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data constructor that stores all of the necessary information related to the `llvm_points_to_bitfield` command. As a result, the changes in this patch are fairly insulated from the rest of SAW, as most of the new code involves adding additional cases to handle `LLVMPointsToBitfield`. * Two of the key new functions are `storePointsToBitfieldValue` and `matchPointsToBitfieldValue`, which implement the behavior of `llvm_points_to_bitfield` in pre- and post-conditions. These functions implement the necessary bit-twiddling to store values in and retrieve values out of bitfield. I have left extensive comments in each function describing how all of this works. * Accompanying `llvm_points_to_bitfield` is a new set of `{enable,disable}_lax_loads_and_stores` command, which toggles the Crucible-side option of the same name. When `enable_lax_loads_and_stores` is on, reading from uninitialized memory will return a symbolic value rather than failing outright. This is essential to be able to deal with LLVM bitcode involving bitfields, as reading a field from a bitfield involves reading the entire bitfield at once, which may include parts of the struct that have not been initialized yet. * There are various `test_bitfield_*` test cases under `intTests` to test examples of bitfield-related specifications that should and should not verify. * I have also updated `saw-remote-api` and `saw-client` to handle bitfields as well, along with a Python-specific test case. Fixes #1461.
This adds support for writing specifications that talk about bitfields in LLVM code by way of the new `llvm_points_to_bitfield` command. Broadly speaking, `llvm_points_to_bitfield ptr fieldName rhs` is like `llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is required to be the name of a field within a bitfield. The salient details are: * LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's debug metadata does. Support for retrieving bitfield-related metadata was added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the `llvm-pretty` submodule to incorporate it. This patch also updates the `crucible` submodule to incorporate corresponding changes in GaloisInc/crucible#936. * The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data constructor that stores all of the necessary information related to the `llvm_points_to_bitfield` command. As a result, the changes in this patch are fairly insulated from the rest of SAW, as most of the new code involves adding additional cases to handle `LLVMPointsToBitfield`. * Two of the key new functions are `storePointsToBitfieldValue` and `matchPointsToBitfieldValue`, which implement the behavior of `llvm_points_to_bitfield` in pre- and post-conditions. These functions implement the necessary bit-twiddling to store values in and retrieve values out of bitfield. I have left extensive comments in each function describing how all of this works. * Accompanying `llvm_points_to_bitfield` is a new set of `{enable,disable}_lax_loads_and_stores` command, which toggles the Crucible-side option of the same name. When `enable_lax_loads_and_stores` is on, reading from uninitialized memory will return a symbolic value rather than failing outright. This is essential to be able to deal with LLVM bitcode involving bitfields, as reading a field from a bitfield involves reading the entire bitfield at once, which may include parts of the struct that have not been initialized yet. * There are various `test_bitfield_*` test cases under `intTests` to test examples of bitfield-related specifications that should and should not verify. * I have also updated `saw-remote-api` and `saw-client` to handle bitfields as well, along with a Python-specific test case. Fixes #1461.
This adds support for writing specifications that talk about bitfields in LLVM code by way of the new `llvm_points_to_bitfield` command. Broadly speaking, `llvm_points_to_bitfield ptr fieldName rhs` is like `llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is required to be the name of a field within a bitfield. The salient details are: * LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's debug metadata does. Support for retrieving bitfield-related metadata was added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the `llvm-pretty` submodule to incorporate it. This patch also updates the `crucible` submodule to incorporate corresponding changes in GaloisInc/crucible#936. * The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data constructor that stores all of the necessary information related to the `llvm_points_to_bitfield` command. As a result, the changes in this patch are fairly insulated from the rest of SAW, as most of the new code involves adding additional cases to handle `LLVMPointsToBitfield`. * Two of the key new functions are `storePointsToBitfieldValue` and `matchPointsToBitfieldValue`, which implement the behavior of `llvm_points_to_bitfield` in pre- and post-conditions. These functions implement the necessary bit-twiddling to store values in and retrieve values out of bitfield. I have left extensive comments in each function describing how all of this works. * Accompanying `llvm_points_to_bitfield` is a new set of `{enable,disable}_lax_loads_and_stores` command, which toggles the Crucible-side option of the same name. When `enable_lax_loads_and_stores` is on, reading from uninitialized memory will return a symbolic value rather than failing outright. This is essential to be able to deal with LLVM bitcode involving bitfields, as reading a field from a bitfield involves reading the entire bitfield at once, which may include parts of the struct that have not been initialized yet. * There are various `test_bitfield_*` test cases under `intTests` to test examples of bitfield-related specifications that should and should not verify. * I have also updated `saw-remote-api` and `saw-client` to handle bitfields as well, along with a Python-specific test case. Fixes #1461.
We'd like to be able to write SAW specifications involving structs that use bitfields. Here is a simple example:
It's tempting to try and specify
enable_x
like so:This will not work, however:
The issue lies in the way that bitfields are compiled to LLVM. If you look at the bitcode for
test.c
, you will see:Note that the memory representation that LLVM chooses for
struct s
istype { i32, i8, i32 }
(rather than, say,type { i32, i1, i1, i32 }
). Moreover, when LLVM accesses thex
bitfield inenable_x
, it does so by firstload
ing an entire byte, applying a bitmask (-2
, or0b11111110
),or
-ing it with1
, and then storing the byte back into the struct. This poses two issues for SAW:load
has the potential to read uninitialized memory, which is something that Crucible is very picky about when simulating LLVM's memory model.{{ 1 : [1] }}
directly to memory due to the byte-oriented way in which LLVM represents bitfields. This is the immediate reason for thetypes not memory-compatible
error message that SAW throws.What can we do about each of these issues?
Fixing (1) properly is the topic of GaloisInc/crucible#366. Implementing the fix suggested in GaloisInc/crucible#366 would be a large undertaking, as it would require overhauling Crucible's implementation of the LLVM memory model to track
undef
andpoison
at a bit-level granularity.We don't necessarily need to do all of this just to support bitfields, however. An alternative would be to piggyback on top of the recently added
lax-loads-and-stores
option. Withlax-loads-and-stores
, reading from uninitialized memory is not an error, but will instead return an arbitrary, symbolic value. Enablinglax-loads-and-stores
would be sufficient to prevent the sorts ofload
s like the one found above from erroring out in Crucible.There is a question of whether we want all SAW users who wish to interact with bitfields to use something like
enable_lax_loads_and_stores
or if we want a more specialized option for this purpose.Fixing (2) can be divided up into two sub-parts:
a. Teach
llvm-pretty
to recognize structs with bitfields.b. Add a command to SAW (which I'm tentatively titling "
llvm_bitfield_points_to
") to allow writing specifications involving bitfield values.Let's start with part (a). As illustrated by the LLVM bitcode from earlier, LLVM's memory representation has no knowledge about what fields in a struct contain bitfields. The only way to discern this information is to dig into the debug metadata (enabled with
-g
) and check with fields have theDIFlagBitField
flag enabled.llvm-pretty
'sText.LLVM.DebugUtils
module already has a number of facilities for searching through debug metadata, so that seems like a reasonable place to implement this functionality. Something like this might suffice:Part (b) is where things get interesting. It may be tempting to extend the existing
llvm_field
command to support bitfields, but this is much easier said than done. Bitfields do not fit into the existingSetupValue
infrastructure very neatly. Something likellvm_field ss "bitfield"
is assumed to be a pointer to a struct field that you can directly write to, but bitfields require more care in order to write to them correctly. As a result, I propose instead having a separatellvm_bitfield_points_to
function that you would use like this:This has two advantages:
We do not need to encode bitfields as
SetupValue
s, which avoids that aforementioned difficulties.If we have consecutive
llvm_bitfield_points_to
calls that write to the same contiguous region of memory, e.g.,Then SAW can detect this and represent this in Crucible with a single
or 2
, where2
/0b11
is the bitmask obtained by overlayingx
's1
value (the least significant bit) andy
's1
value (the most significant bit) in memory. Alternatively, one couldor
each value into memory individually, but this is less efficient.Because
llvm_bitfield_points_to
would not need to store bitfields asSeutpValue
s, we are free to choose how to represent them in a SAWStateSpec
. One way that might be sensible is to represent them as aSetupCondition
that is used for LLVM (but not JVM) specifications. I haven't implemented this idea to 100% completion, but it seems like it would pan out.The downside to having a separate
llvm_bitfield_points_to
command is that users would need to remember to use a special function whenever they are dealing with bitfields. This is unfortunate, but the downright quirkiness of bitfields may prevent us from doing better. If we really wanted to, we could imagine having a very special case inllvm_points_to
that checks if the first argument is syntactically equal to something likellvm_field ss "b"
(whereb
is the name of a bitfield), and if so, treat it as though we had writtenllvm_bitfield_points_to ss "b"
. It remains to be seen if something like this actually makes SAW users' lives that much easier.The text was updated successfully, but these errors were encountered: