Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using bitfields in SAW specifications #1461

Closed
RyanGlScott opened this issue Sep 21, 2021 · 5 comments · Fixed by #1539
Closed

Allow using bitfields in SAW specifications #1461

RyanGlScott opened this issue Sep 21, 2021 · 5 comments · Fixed by #1539
Labels
needs design Technical design work is needed for issue to progress subsystem: crucible-llvm Issues related to LLVM bitcode verification with crucible-llvm topics: bitfields Issues related to SAW's support for bitfields topics: memory model Issues that relate to the LLVM and/or Crucible model of pointers and memory blocks type: feature request Issues requesting a new feature or capability

Comments

@RyanGlScott
Copy link
Contributor

We'd like to be able to write SAW specifications involving structs that use bitfields. Here is a simple example:

#include <stdint.h>

struct s {
  int32_t w;
  uint8_t x:1;
  uint8_t y:1;
  int32_t z;
};

void enable_x(struct s *ss) {
  ss->x = 1;
}

It's tempting to try and specify enable_x like so:

let enable_x_spec = do {
  ss <- llvm_alloc (llvm_alias "struct.s");
  llvm_execute_func [ss];
  llvm_points_to (llvm_field ss "x") (llvm_term {{ 1 : [1] }});
};

m <- llvm_load_module "test.bc";
llvm_verify m "enable_x" [] false enable_x_spec z3;

This will not work, however:

$ clang -c -g -emit-llvm test.c
$ ~/Software/saw-0.8/bin/saw test.saw


[17:55:01.057] Loading file "/home/rscott/Documents/Hacking/C/test.saw"
[17:55:01.078] Stack trace:
"llvm_verify" (/home/rscott/Documents/Hacking/C/test.saw:8:1-8:12):
"enable_x_spec" (/home/rscott/Documents/Hacking/C/test.saw:8:35-8:48):
"llvm_points_to" (/home/rscott/Documents/Hacking/C/test.saw:4:3-4:17):
types not memory-compatible:
i8
i1

The issue lies in the way that bitfields are compiled to LLVM. If you look at the bitcode for test.c, you will see:

$ cat test.ll
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

%struct.s = type { i32, i8, i32 }

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @enable_x(%struct.s* %0) #0 !dbg !7 {
  %2 = alloca %struct.s*, align 8
  store %struct.s* %0, %struct.s** %2, align 8
  call void @llvm.dbg.declare(metadata %struct.s** %2, metadata !26, metadata !DIExpression()), !dbg !27
  %3 = load %struct.s*, %struct.s** %2, align 8, !dbg !28
  %4 = getelementptr inbounds %struct.s, %struct.s* %3, i32 0, i32 1, !dbg !29
  %5 = load i8, i8* %4, align 4, !dbg !30
  %6 = and i8 %5, -2, !dbg !30
  %7 = or i8 %6, 1, !dbg !30
  store i8 %7, i8* %4, align 4, !dbg !30
  ret void, !dbg !31
}

; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}
!llvm.ident = !{!6}

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0-4ubuntu1 ", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "test.c", directory: "/home/rscott/Documents/Hacking/C")
!2 = !{}
!3 = !{i32 7, !"Dwarf Version", i32 4}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{!"clang version 10.0.0-4ubuntu1 "}
!7 = distinct !DISubprogram(name: "enable_x", scope: !1, file: !1, line: 10, type: !8, scopeLine: 10, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!8 = !DISubroutineType(types: !9)
!9 = !{null, !10}
!10 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "s", file: !1, line: 3, size: 96, elements: !12)
!12 = !{!13, !19, !24, !25}
!13 = !DIDerivedType(tag: DW_TAG_member, name: "w", scope: !11, file: !1, line: 4, baseType: !14, size: 32)
!14 = !DIDerivedType(tag: DW_TAG_typedef, name: "int32_t", file: !15, line: 26, baseType: !16)
!15 = !DIFile(filename: "/usr/include/x86_64-linux-gnu/bits/stdint-intn.h", directory: "")
!16 = !DIDerivedType(tag: DW_TAG_typedef, name: "__int32_t", file: !17, line: 41, baseType: !18)
!17 = !DIFile(filename: "/usr/include/x86_64-linux-gnu/bits/types.h", directory: "")
!18 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!19 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !11, file: !1, line: 5, baseType: !20, size: 1, offset: 32, flags: DIFlagBitField, extraData: i64 32)
!20 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint8_t", file: !21, line: 24, baseType: !22)
!21 = !DIFile(filename: "/usr/include/x86_64-linux-gnu/bits/stdint-uintn.h", directory: "")
!22 = !DIDerivedType(tag: DW_TAG_typedef, name: "__uint8_t", file: !17, line: 38, baseType: !23)
!23 = !DIBasicType(name: "unsigned char", size: 8, encoding: DW_ATE_unsigned_char)
!24 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !11, file: !1, line: 6, baseType: !20, size: 1, offset: 33, flags: DIFlagBitField, extraData: i64 32)
!25 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !11, file: !1, line: 7, baseType: !14, size: 32, offset: 64)
!26 = !DILocalVariable(name: "ss", arg: 1, scope: !7, file: !1, line: 10, type: !10)
!27 = !DILocation(line: 10, column: 25, scope: !7)
!28 = !DILocation(line: 11, column: 3, scope: !7)
!29 = !DILocation(line: 11, column: 7, scope: !7)
!30 = !DILocation(line: 11, column: 9, scope: !7)
!31 = !DILocation(line: 12, column: 1, scope: !7)

Note that the memory representation that LLVM chooses for struct s is type { i32, i8, i32 } (rather than, say, type { i32, i1, i1, i32 }). Moreover, when LLVM accesses the x bitfield in enable_x, it does so by first loading an entire byte, applying a bitmask (-2, or 0b11111110), or-ing it with 1, and then storing the byte back into the struct. This poses two issues for SAW:

  1. The load has the potential to read uninitialized memory, which is something that Crucible is very picky about when simulating LLVM's memory model.
  2. We can't simply write {{ 1 : [1] }} directly to memory due to the byte-oriented way in which LLVM represents bitfields. This is the immediate reason for the types not memory-compatible error message that SAW throws.

What can we do about each of these issues?


Fixing (1) properly is the topic of GaloisInc/crucible#366. Implementing the fix suggested in GaloisInc/crucible#366 would be a large undertaking, as it would require overhauling Crucible's implementation of the LLVM memory model to track undef and poison at a bit-level granularity.

We don't necessarily need to do all of this just to support bitfields, however. An alternative would be to piggyback on top of the recently added lax-loads-and-stores option. With lax-loads-and-stores, reading from uninitialized memory is not an error, but will instead return an arbitrary, symbolic value. Enabling lax-loads-and-stores would be sufficient to prevent the sorts of loads like the one found above from erroring out in Crucible.

There is a question of whether we want all SAW users who wish to interact with bitfields to use something like enable_lax_loads_and_stores or if we want a more specialized option for this purpose.


Fixing (2) can be divided up into two sub-parts:

a. Teach llvm-pretty to recognize structs with bitfields.
b. Add a command to SAW (which I'm tentatively titling "llvm_bitfield_points_to") to allow writing specifications involving bitfield values.

Let's start with part (a). As illustrated by the LLVM bitcode from earlier, LLVM's memory representation has no knowledge about what fields in a struct contain bitfields. The only way to discern this information is to dig into the debug metadata (enabled with -g) and check with fields have the DIFlagBitField flag enabled. llvm-pretty's Text.LLVM.DebugUtils module already has a number of facilities for searching through debug metadata, so that seems like a reasonable place to implement this functionality. Something like this might suffice:

diff --git a/src/Text/LLVM/DebugUtils.hs b/src/Text/LLVM/DebugUtils.hs
index 2a22f67..56bc0c7 100644
--- a/src/Text/LLVM/DebugUtils.hs
+++ b/src/Text/LLVM/DebugUtils.hs
@@ -10,7 +10,8 @@ Point-of-contact : emertens
 -}
 module Text.LLVM.DebugUtils
   ( -- * Definition type analyzer
-    Info(..), computeFunctionTypes, valMdToInfo
+    Info(..), StructFieldInfo(..), BitfieldInfo(..), UnionFieldInfo(..)
+  , computeFunctionTypes, valMdToInfo
   , localVariableNameDeclarations

   -- * Metadata lookup
@@ -27,6 +28,7 @@ module Text.LLVM.DebugUtils

 import           Control.Applicative    ((<|>))
 import           Control.Monad          ((<=<))
+import           Data.Bits              (Bits(..))
 import           Data.IntMap            (IntMap)
 import qualified Data.IntMap as IntMap
 import           Data.List              (elemIndex, tails, stripPrefix)
@@ -56,13 +58,34 @@ type MdMap = IntMap ValMd

 data Info
   = Pointer Info
-  | Structure [(String,Word64,Info)] -- ^ Fields: name, bit-offset, info
-  | Union     [(String,Info)]
+  | Structure [StructFieldInfo]
+  | Union     [UnionFieldInfo]
   | ArrInfo Info
   | BaseType String
   | Unknown
   deriving Show

+-- | TODO RGS: Docs
+data StructFieldInfo = StructFieldInfo
+  { sfiName     :: String
+  , sfiOffset   :: Word64
+  , sfiBitfield :: Maybe BitfieldInfo
+  , sfiInfo     :: Info
+  } deriving Show
+
+-- | TODO RGS
+data BitfieldInfo = BitfieldInfo
+  { biBaseOffset :: Word64
+  , biBitSize    :: Word64
+  } deriving Show
+
+-- | TODO RGS: Docs
+-- TODO RGS: Do we need ufiBitfield?
+data UnionFieldInfo = UnionFieldInfo
+  { ufiName :: String
+  , ufiInfo :: Info
+  } deriving Show
+
 {-
 import Text.Show.Pretty
 import Data.Foldable
@@ -89,6 +112,10 @@ getDebugInfo mdMap (ValMdRef i)    = getDebugInfo mdMap =<< IntMap.lookup i mdMa
 getDebugInfo _ (ValMdDebugInfo di) = Just di
 getDebugInfo _ _                   = Nothing

+getInteger :: MdMap -> ValMd -> Maybe Integer
+getInteger mdMap (ValMdRef i)                          = getInteger mdMap =<< IntMap.lookup i mdMap
+getInteger _     (ValMdValue (Typed _ (ValInteger i))) = Just i
+getInteger _     _                                     = Nothing

 getList :: MdMap -> ValMd -> Maybe [Maybe ValMd]
 getList mdMap (ValMdRef i) = getList mdMap =<< IntMap.lookup i mdMap
@@ -125,25 +152,40 @@ getFieldDIs mdMap =
   traverse (getDebugInfo mdMap) <=< sequence <=< getList mdMap <=< dictElements


-getStructFields :: MdMap -> DICompositeType -> Maybe [(String, Word64, Info)]
+getStructFields :: MdMap -> DICompositeType -> Maybe [StructFieldInfo]
 getStructFields mdMap = traverse (debugInfoToStructField mdMap) <=< getFieldDIs mdMap

-debugInfoToStructField :: MdMap -> DebugInfo -> Maybe (String, Word64, Info)
+debugInfoToStructField :: MdMap -> DebugInfo -> Maybe StructFieldInfo
 debugInfoToStructField mdMap di =
   do DebugInfoDerivedType dt <- Just di
      fieldName               <- didtName dt
-     Just (fieldName, didtOffset dt, valMdToInfo' mdMap (didtBaseType dt))
-
-
-getUnionFields :: MdMap -> DICompositeType -> Maybe [(String, Info)]
+     -- TODO RGS: Cite https://github.com/llvm/llvm-project/blob/1bebc31c617d1a0773f1d561f02dd17c5e83b23b/llvm/include/llvm/IR/DebugInfoFlags.def#L51 somehow
+     let bitfield | testBit (didtFlags dt) 19
+                  , Just extraData  <- didtExtraData dt
+                  , Just baseOffset <- getInteger mdMap extraData
+                  = Just $ BitfieldInfo { biBaseOffset = fromInteger baseOffset
+                                        , biBitSize    = didtSize dt
+                                        }
+                  | otherwise
+                  = Nothing
+     Just (StructFieldInfo { sfiName     = fieldName
+                           , sfiOffset   = didtOffset dt
+                           , sfiBitfield = bitfield
+                           , sfiInfo     = valMdToInfo' mdMap (didtBaseType dt)
+                           })
+
+
+getUnionFields :: MdMap -> DICompositeType -> Maybe [UnionFieldInfo]
 getUnionFields mdMap = traverse (debugInfoToUnionField mdMap) <=< getFieldDIs mdMap


-debugInfoToUnionField :: MdMap -> DebugInfo -> Maybe (String, Info)
+debugInfoToUnionField :: MdMap -> DebugInfo -> Maybe UnionFieldInfo
 debugInfoToUnionField mdMap di =
   do DebugInfoDerivedType dt <- Just di
      fieldName               <- didtName dt
-     Just (fieldName, valMdToInfo' mdMap (didtBaseType dt))
+     Just (UnionFieldInfo { ufiName = fieldName
+                          , ufiInfo = valMdToInfo' mdMap (didtBaseType dt)
+                          })



@@ -219,8 +261,8 @@ fieldIndexByPosition ::
   Info {- ^ type information for specified field -}
 fieldIndexByPosition i info =
   case info of
-    Structure xs -> go [ x | (_,_,x) <- xs ]
-    Union     xs -> go [ x | (_,x)   <- xs ]
+    Structure xs -> go [ x | StructFieldInfo{sfiInfo = x} <- xs ]
+    Union     xs -> go [ x | UnionFieldInfo{ufiInfo = x}  <- xs ]
     _            -> Unknown
   where
     go xs = case drop i xs of
@@ -235,8 +277,8 @@ fieldIndexByName ::
   Maybe Int {- ^ zero-based index of field matching the name -}
 fieldIndexByName n info =
   case info of
-    Structure xs -> go [ x | (x,_,_) <- xs ]
-    Union     xs -> go [ x | (x,_)   <- xs ]
+    Structure xs -> go [ x | StructFieldInfo{sfiName = x} <- xs ]
+    Union     xs -> go [ x | UnionFieldInfo{ufiName = x}  <- xs ]
     _            -> Nothing
   where
     go = elemIndex n

Part (b) is where things get interesting. It may be tempting to extend the existing llvm_field command to support bitfields, but this is much easier said than done. Bitfields do not fit into the existing SetupValue infrastructure very neatly. Something like llvm_field ss "bitfield" is assumed to be a pointer to a struct field that you can directly write to, but bitfields require more care in order to write to them correctly. As a result, I propose instead having a separate llvm_bitfield_points_to function that you would use like this:

let enable_x_spec = do {
  ss <- llvm_alloc (llvm_alias "struct.s");
  llvm_execute_func [ss];
  llvm_bitfield_points_to ss "x" (llvm_term {{ 1 : [1] }});
};

This has two advantages:

  1. We do not need to encode bitfields as SetupValues, which avoids that aforementioned difficulties.

  2. If we have consecutive llvm_bitfield_points_to calls that write to the same contiguous region of memory, e.g.,

    llvm_bitfield_points_to ss "x" (llvm_term {{ 1 : [1] }});
    llvm_bitfield_points_to ss "y" (llvm_term {{ 1 : [1] }});
    

    Then SAW can detect this and represent this in Crucible with a single or 2, where 2/0b11 is the bitmask obtained by overlaying x's 1 value (the least significant bit) and y's 1 value (the most significant bit) in memory. Alternatively, one could or each value into memory individually, but this is less efficient.

Because llvm_bitfield_points_to would not need to store bitfields as SeutpValues, we are free to choose how to represent them in a SAW StateSpec. One way that might be sensible is to represent them as a SetupCondition that is used for LLVM (but not JVM) specifications. I haven't implemented this idea to 100% completion, but it seems like it would pan out.

The downside to having a separate llvm_bitfield_points_to command is that users would need to remember to use a special function whenever they are dealing with bitfields. This is unfortunate, but the downright quirkiness of bitfields may prevent us from doing better. If we really wanted to, we could imagine having a very special case in llvm_points_to that checks if the first argument is syntactically equal to something like llvm_field ss "b" (where b is the name of a bitfield), and if so, treat it as though we had written llvm_bitfield_points_to ss "b". It remains to be seen if something like this actually makes SAW users' lives that much easier.

@RyanGlScott RyanGlScott added subsystem: crucible-llvm Issues related to LLVM bitcode verification with crucible-llvm needs design Technical design work is needed for issue to progress type: feature request Issues requesting a new feature or capability topics: memory model Issues that relate to the LLVM and/or Crucible model of pointers and memory blocks labels Sep 21, 2021
RyanGlScott added a commit to RyanGlScott/s2n-tls that referenced this issue Oct 2, 2021
Ultimately, this is for the benefit of the SAW proofs. Due to a limitation
in how SAW currently works, bitfields must be accessed by index rather than
by name, and due to how often new fields are added to `s2n_connection`, the
only way to do this in way that's maintainable is to have all the bitfield
fields be up front. That way, the index to access the bitfield will always be
zero, which significantly decreases the likelihood that the SAW proofs will
need to be updated with each new field added to `s2n_connection`.

This is all rather unfortunate. See
GaloisInc/saw-script#1461 for a plan to
make handling bitfields more maintainable in SAW.
@RyanGlScott
Copy link
Contributor Author

A related issue is that SAW is overzealous about initializing bitfields in llvm_points_to statements. If you have this code:

#include <stdint.h>

struct s {
  int32_t w;
  uint8_t x:1;
  uint8_t y:1;
  int32_t z;
};

uint8_t f(struct s *ss) {
  return ss->x + ss->y;
}

Then this specification for f should fail:

let f_spec = do {
  ss <- llvm_alloc (llvm_alias "struct.s");
  llvm_points_to (llvm_field ss "x") (llvm_term {{ 1 : [8] }});

  llvm_execute_func [ss];

  llvm_return (llvm_term {{ 1 : [8] }});
};

This is because the preconditions initialize ss->x but not ss->y. Despite this, SAW happily accepts this specification. This is because llvm_points_to (llvm_field ss "x") (llvm_term {{ 1 : [8] }}) doesn't do what you might think it does. Rather than initialize x, it initializes the entirety of the bitfield in which x resides. This means that the bit corresponding to x will be initialized to 1, and the remaining seven bits—one of which corresponds to y—will be initialized to zero. Eek!

We should definitely add a validity check to ensure that fields in bitfield aren't used in llvm_points_to statements like the one above. That alone won't be enough to ensure the issue doesn't happen, however. We will likely implement llvm_bitfield_points_to statements under the hood by writing some or'd value to the entire bitfield, so we need to track which specific bits in the bitfield we have initialized. If simulation tries to use bits that have not been explicitly initialized, we must throw an error of some kind.

RyanGlScott added a commit to RyanGlScott/s2n-tls that referenced this issue Oct 5, 2021
Ultimately, this is for the benefit of the SAW proofs. Due to a limitation
in how SAW currently works, bitfields must be accessed by index rather than
by name, and due to how often new fields are added to `s2n_connection`, the
only way to do this in way that's maintainable is to have all the bitfield
fields be up front. That way, the index to access the bitfield will always be
zero, which significantly decreases the likelihood that the SAW proofs will
need to be updated with each new field added to `s2n_connection`.

This is all rather unfortunate. See
GaloisInc/saw-script#1461 for a plan to
make handling bitfields more maintainable in SAW.
RyanGlScott added a commit to RyanGlScott/s2n-tls that referenced this issue Oct 7, 2021
Ultimately, this is for the benefit of the SAW proofs. Due to a limitation
in how SAW currently works, bitfields must be accessed by index rather than
by name, and due to how often new fields are added to `s2n_connection`, the
only way to do this in way that's maintainable is to have all the bitfield
fields be up front. That way, the index to access the bitfield will always be
zero, which significantly decreases the likelihood that the SAW proofs will
need to be updated with each new field added to `s2n_connection`.

This is all rather unfortunate. See
GaloisInc/saw-script#1461 for a plan to
make handling bitfields more maintainable in SAW.
@robdockins
Copy link
Contributor

I think the easiest and most robust way to handle this is the same as I have been contemplating for crux generally. The idea is, to make allocations (by default, at least) be populated by fresh uninterpreted bytes at allocation time. Once that happens, I think most of these tricky questions about bitfields get resolved.

@RyanGlScott
Copy link
Contributor Author

I partially retract my claim that we should make the program in #1461 (comment) throw an error. The only way that SAW can know if a struct has bitfields or not is by inspecting LLVM debug information, and since it's not guaranteed that every program will have debug information, we couldn't reliably reject programs like that one. At best, we could make it a warning that only fires if debug information is present.

@RyanGlScott
Copy link
Contributor Author

I think the easiest and most robust way to handle this is the same as I have been contemplating for crux generally. The idea is, to make allocations (by default, at least) be populated by fresh uninterpreted bytes at allocation time.

That's option (1) from GaloisInc/crucible#844 (comment), right?

Once that happens, I think most of these tricky questions about bitfields get resolved.

Which tricky issues in particular are you referring to? I still think we'd need to track bitfield-specific information in SAW in order to do this correctly, but perhaps I'm missing something.

RyanGlScott added a commit to RyanGlScott/llvm-pretty that referenced this issue Dec 3, 2021
LLVM bitcode doesn't directly record information about bitfields, but its debug
information _does_ record this information. Knowing about bitfields is
important for certain applications—see, for example, GaloisInc/saw-script#1461.
This changes `Text.LLVM.DebugUtils` such that if any of the fields in a struct
have bitfields, it will record this information in the new `BitfieldInfo`
data type.

This requires a backwards-incompatible change to the type of the `Structure`
data constructor. In case we need to add additional fields to `Structure` in
the future, I converted `Structure`'s fields into a record data type, which
makes it slightly easier to extend. I also did the same thing to `Union` for
consistency (although this is not strictly necessary).
@RyanGlScott
Copy link
Contributor Author

I've opened a PR with the necessary llvm-pretty changes in GaloisInc/llvm-pretty#90.

@RyanGlScott RyanGlScott added the topics: bitfields Issues related to SAW's support for bitfields label Dec 5, 2021
elliottt pushed a commit to GaloisInc/llvm-pretty that referenced this issue Dec 6, 2021
LLVM bitcode doesn't directly record information about bitfields, but its debug
information _does_ record this information. Knowing about bitfields is
important for certain applications—see, for example, GaloisInc/saw-script#1461.
This changes `Text.LLVM.DebugUtils` such that if any of the fields in a struct
have bitfields, it will record this information in the new `BitfieldInfo`
data type.

This requires a backwards-incompatible change to the type of the `Structure`
data constructor. In case we need to add additional fields to `Structure` in
the future, I converted `Structure`'s fields into a record data type, which
makes it slightly easier to extend. I also did the same thing to `Union` for
consistency (although this is not strictly necessary).
RyanGlScott added a commit that referenced this issue Dec 8, 2021
This adds support for writing specifications that talk about bitfields in LLVM
code by way of the new `llvm_points_to_bitfield` command. Broadly speaking,
`llvm_points_to_bitfield ptr fieldName rhs` is like
`llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is
required to be the name of a field within a bitfield. The salient details are:

* LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's
  debug metadata does. Support for retrieving bitfield-related metadata was
  added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the
  `llvm-pretty` submodule to incorporate it. This patch also updates the
  `crucible` submodule to incorporate corresponding changes in
  GaloisInc/crucible#936.

* The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data
  constructor that stores all of the necessary information related to the
  `llvm_points_to_bitfield` command. As a result, the changes in this patch
  are fairly insulated from the rest of SAW, as most of the new code involves
  adding additional cases to handle `LLVMPointsToBitfield`.

* Two of the key new functions are `storePointsToBitfieldValue` and
  `matchPointsToBitfieldValue`, which implement the behavior of
  `llvm_points_to_bitfield` in pre- and post-conditions. These functions
  implement the necessary bit-twiddling to store values in and retrieve values
  out of bitfield. I have left extensive comments in each function describing
  how all of this works.

* Accompanying `llvm_points_to_bitfield` is a new set of
  `{enable,disable}_lax_loads_and_stores` command, which toggles the
  Crucible-side option of the same name. When `enable_lax_loads_and_stores` is
  on, reading from uninitialized memory will return a symbolic value rather
  than failing outright. This is essential to be able to deal with LLVM bitcode
  involving bitfields, as reading a field from a bitfield involves reading the
  entire bitfield at once, which may include parts of the struct that have not
  been initialized yet.

* There are various `test_bitfield_*` test cases under `intTests` to test
  examples of bitfield-related specifications that should and should not
  verify.

* I have also updated `saw-remote-api` and `saw-client` to handle bitfields as
  well, along with a Python-specific test case.

Fixes #1461.
RyanGlScott added a commit that referenced this issue Dec 10, 2021
This adds support for writing specifications that talk about bitfields in LLVM
code by way of the new `llvm_points_to_bitfield` command. Broadly speaking,
`llvm_points_to_bitfield ptr fieldName rhs` is like
`llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is
required to be the name of a field within a bitfield. The salient details are:

* LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's
  debug metadata does. Support for retrieving bitfield-related metadata was
  added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the
  `llvm-pretty` submodule to incorporate it. This patch also updates the
  `crucible` submodule to incorporate corresponding changes in
  GaloisInc/crucible#936.

* The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data
  constructor that stores all of the necessary information related to the
  `llvm_points_to_bitfield` command. As a result, the changes in this patch
  are fairly insulated from the rest of SAW, as most of the new code involves
  adding additional cases to handle `LLVMPointsToBitfield`.

* Two of the key new functions are `storePointsToBitfieldValue` and
  `matchPointsToBitfieldValue`, which implement the behavior of
  `llvm_points_to_bitfield` in pre- and post-conditions. These functions
  implement the necessary bit-twiddling to store values in and retrieve values
  out of bitfield. I have left extensive comments in each function describing
  how all of this works.

* Accompanying `llvm_points_to_bitfield` is a new set of
  `{enable,disable}_lax_loads_and_stores` command, which toggles the
  Crucible-side option of the same name. When `enable_lax_loads_and_stores` is
  on, reading from uninitialized memory will return a symbolic value rather
  than failing outright. This is essential to be able to deal with LLVM bitcode
  involving bitfields, as reading a field from a bitfield involves reading the
  entire bitfield at once, which may include parts of the struct that have not
  been initialized yet.

* There are various `test_bitfield_*` test cases under `intTests` to test
  examples of bitfield-related specifications that should and should not
  verify.

* I have also updated `saw-remote-api` and `saw-client` to handle bitfields as
  well, along with a Python-specific test case.

Fixes #1461.
RyanGlScott added a commit that referenced this issue Dec 10, 2021
This adds support for writing specifications that talk about bitfields in LLVM
code by way of the new `llvm_points_to_bitfield` command. Broadly speaking,
`llvm_points_to_bitfield ptr fieldName rhs` is like
`llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is
required to be the name of a field within a bitfield. The salient details are:

* LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's
  debug metadata does. Support for retrieving bitfield-related metadata was
  added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the
  `llvm-pretty` submodule to incorporate it. This patch also updates the
  `crucible` submodule to incorporate corresponding changes in
  GaloisInc/crucible#936.

* The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data
  constructor that stores all of the necessary information related to the
  `llvm_points_to_bitfield` command. As a result, the changes in this patch
  are fairly insulated from the rest of SAW, as most of the new code involves
  adding additional cases to handle `LLVMPointsToBitfield`.

* Two of the key new functions are `storePointsToBitfieldValue` and
  `matchPointsToBitfieldValue`, which implement the behavior of
  `llvm_points_to_bitfield` in pre- and post-conditions. These functions
  implement the necessary bit-twiddling to store values in and retrieve values
  out of bitfield. I have left extensive comments in each function describing
  how all of this works.

* Accompanying `llvm_points_to_bitfield` is a new set of
  `{enable,disable}_lax_loads_and_stores` command, which toggles the
  Crucible-side option of the same name. When `enable_lax_loads_and_stores` is
  on, reading from uninitialized memory will return a symbolic value rather
  than failing outright. This is essential to be able to deal with LLVM bitcode
  involving bitfields, as reading a field from a bitfield involves reading the
  entire bitfield at once, which may include parts of the struct that have not
  been initialized yet.

* There are various `test_bitfield_*` test cases under `intTests` to test
  examples of bitfield-related specifications that should and should not
  verify.

* I have also updated `saw-remote-api` and `saw-client` to handle bitfields as
  well, along with a Python-specific test case.

Fixes #1461.
RyanGlScott added a commit that referenced this issue Dec 11, 2021
This adds support for writing specifications that talk about bitfields in LLVM
code by way of the new `llvm_points_to_bitfield` command. Broadly speaking,
`llvm_points_to_bitfield ptr fieldName rhs` is like
`llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is
required to be the name of a field within a bitfield. The salient details are:

* LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's
  debug metadata does. Support for retrieving bitfield-related metadata was
  added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the
  `llvm-pretty` submodule to incorporate it. This patch also updates the
  `crucible` submodule to incorporate corresponding changes in
  GaloisInc/crucible#936.

* The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data
  constructor that stores all of the necessary information related to the
  `llvm_points_to_bitfield` command. As a result, the changes in this patch
  are fairly insulated from the rest of SAW, as most of the new code involves
  adding additional cases to handle `LLVMPointsToBitfield`.

* Two of the key new functions are `storePointsToBitfieldValue` and
  `matchPointsToBitfieldValue`, which implement the behavior of
  `llvm_points_to_bitfield` in pre- and post-conditions. These functions
  implement the necessary bit-twiddling to store values in and retrieve values
  out of bitfield. I have left extensive comments in each function describing
  how all of this works.

* Accompanying `llvm_points_to_bitfield` is a new set of
  `{enable,disable}_lax_loads_and_stores` command, which toggles the
  Crucible-side option of the same name. When `enable_lax_loads_and_stores` is
  on, reading from uninitialized memory will return a symbolic value rather
  than failing outright. This is essential to be able to deal with LLVM bitcode
  involving bitfields, as reading a field from a bitfield involves reading the
  entire bitfield at once, which may include parts of the struct that have not
  been initialized yet.

* There are various `test_bitfield_*` test cases under `intTests` to test
  examples of bitfield-related specifications that should and should not
  verify.

* I have also updated `saw-remote-api` and `saw-client` to handle bitfields as
  well, along with a Python-specific test case.

Fixes #1461.
RyanGlScott added a commit that referenced this issue Dec 21, 2021
This adds support for writing specifications that talk about bitfields in LLVM
code by way of the new `llvm_points_to_bitfield` command. Broadly speaking,
`llvm_points_to_bitfield ptr fieldName rhs` is like
`llvm_points_to (llvm_field ptr fieldName) rhs`, except that `fieldName` is
required to be the name of a field within a bitfield. The salient details are:

* LLVM bitcode itself does not a built-in concept of bitfields, but LLVM's
  debug metadata does. Support for retrieving bitfield-related metadata was
  added to `llvm-pretty` in GaloisInc/llvm-pretty#90, so this patch bumps the
  `llvm-pretty` submodule to incorporate it. This patch also updates the
  `crucible` submodule to incorporate corresponding changes in
  GaloisInc/crucible#936.

* The `LLVMPointsTo` data type now has a new `LLVMPointsToBitfield` data
  constructor that stores all of the necessary information related to the
  `llvm_points_to_bitfield` command. As a result, the changes in this patch
  are fairly insulated from the rest of SAW, as most of the new code involves
  adding additional cases to handle `LLVMPointsToBitfield`.

* Two of the key new functions are `storePointsToBitfieldValue` and
  `matchPointsToBitfieldValue`, which implement the behavior of
  `llvm_points_to_bitfield` in pre- and post-conditions. These functions
  implement the necessary bit-twiddling to store values in and retrieve values
  out of bitfield. I have left extensive comments in each function describing
  how all of this works.

* Accompanying `llvm_points_to_bitfield` is a new set of
  `{enable,disable}_lax_loads_and_stores` command, which toggles the
  Crucible-side option of the same name. When `enable_lax_loads_and_stores` is
  on, reading from uninitialized memory will return a symbolic value rather
  than failing outright. This is essential to be able to deal with LLVM bitcode
  involving bitfields, as reading a field from a bitfield involves reading the
  entire bitfield at once, which may include parts of the struct that have not
  been initialized yet.

* There are various `test_bitfield_*` test cases under `intTests` to test
  examples of bitfield-related specifications that should and should not
  verify.

* I have also updated `saw-remote-api` and `saw-client` to handle bitfields as
  well, along with a Python-specific test case.

Fixes #1461.
@mergify mergify bot closed this as completed in #1539 Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs design Technical design work is needed for issue to progress subsystem: crucible-llvm Issues related to LLVM bitcode verification with crucible-llvm topics: bitfields Issues related to SAW's support for bitfields topics: memory model Issues that relate to the LLVM and/or Crucible model of pointers and memory blocks type: feature request Issues requesting a new feature or capability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants