`macaw-x86-symbolic` simulates `idiv` instructions incorrectly #393

RyanGlScott · 2024-07-12T18:27:56Z

Consider this C program, appropriately decorated for use in macaw's test suite:

// bug.c
#include <stdlib.h>

int __attribute__((noinline)) mod2(int x) {
  return x % 2;
}

int __attribute__((noinline)) test_mod2(void) {
  return mod2(1);
}

void _start(void) {
  test_mod2();
};

And consider this test harness (which was extracted from macaw-x86-symbolic's test suite):

-- Bug.hs
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
module Main (main) where

import           Control.Lens ((^.))
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import qualified Data.ElfEdit as Elf
import qualified Data.Foldable as F
import qualified Data.Map as Map
import           Data.Maybe ( mapMaybe )
import qualified Data.Parameterized.Classes as PC
import qualified Data.Parameterized.Nonce as PN
import           Data.Parameterized.Some ( Some(..) )
import           Data.Proxy ( Proxy(..) )

import qualified What4.Config as WC
import qualified What4.Expr as WE
import qualified What4.Expr.Builder as WEB
import qualified What4.FloatMode as WF
import qualified What4.Interface as WI
import qualified What4.ProblemFeatures as WPF
import qualified What4.Solver as WS

import qualified Lang.Crucible.Backend as CB
import qualified Lang.Crucible.Backend.Online as CBO
import qualified Lang.Crucible.Simulator as CS
import qualified Lang.Crucible.LLVM.MemModel as LLVM

import qualified Data.Macaw.Architecture.Info as MAI
import qualified Data.Macaw.Discovery as M
import qualified Data.Macaw.Symbolic as MS
import qualified Data.Macaw.Symbolic.Testing as MST
import qualified Data.Macaw.X86 as MX
import           Data.Macaw.X86.Symbolic ()

main :: IO ()
main = do
  -- These are pass/fail in that the assertions in the "pass" set are true (and
  -- the solver returns Unsat), while the assertions in the "fail" set are false
  -- (and the solver returns Sat).
  let passRes = MST.SimulationResult MST.Unsat
  mkSymExTest passRes MST.LazyMemModel "bug.exe"

hasTestPrefix :: Some (M.DiscoveryFunInfo arch) -> Maybe (BS8.ByteString, Some (M.DiscoveryFunInfo arch))
hasTestPrefix (Some dfi) = do
  bsname <- M.discoveredFunSymbol dfi
  if BS8.pack "test_" `BS8.isPrefixOf` bsname
    then return (bsname, Some dfi)
    else Nothing

mkSymExTest :: MST.SimulationResult -> MST.MemModelPreset -> FilePath -> IO ()
mkSymExTest expected mmPreset exePath = do
  bytes <- BS.readFile exePath
  case Elf.decodeElfHeaderInfo bytes of
    Left (_, msg) -> fail ("Error parsing ELF header from file '" ++ show exePath ++ "': " ++ msg)
    Right (Elf.SomeElf ehi) -> do
      case Elf.headerClass (Elf.header ehi) of
        Elf.ELFCLASS32 -> fail "32 bit x86 binaries are not supported"
        Elf.ELFCLASS64 -> do
          binariesInfo <- MST.runDiscovery ehi exePath MST.toAddrSymMap MX.x86_64_linux_info MX.x86_64PLTStubInfo
          let funInfos = Map.elems (MST.binaryDiscState (MST.mainBinaryInfo binariesInfo) ^. M.funInfo)
          let testEntryPoints = mapMaybe hasTestPrefix funInfos
          F.forM_ testEntryPoints $ \(name, Some dfi) -> do
            Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
            sym <- WEB.newExprBuilder WF.FloatRealRepr WE.EmptyExprBuilderState gen
            CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
              -- We are using the z3 backend to discharge proof obligations, so
              -- we need to add its options to the backend configuration
              let solver = WS.z3Adapter
              let backendConf = WI.getConfiguration sym
              WC.extendConfig (WS.solver_adapter_config_options solver) backendConf

              execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
              archVals <- case MS.archVals (Proxy @MX.X86_64) Nothing of
                            Just archVals -> pure archVals
                            Nothing -> error "mkSymExTest: impossible"
              let extract = x86ResultExtractor archVals
              let ?memOpts = LLVM.defaultMemOptions
              simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures MX.x86_64_linux_info archVals binariesInfo mmPreset extract dfi
              if expected == simRes
                then putStrLn "Test passed"
                else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes

-- | X86_64 functions with a single scalar return value return it in %rax
--
-- Since all test functions must return a value to assert as true, this is
-- straightforward to extract
x86ResultExtractor :: (CB.IsSymInterface sym) => MS.ArchVals MX.X86_64 -> MST.ResultExtractor sym MX.X86_64
x86ResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
  let re = MS.lookupReg archVals regs MX.RAX
  k PC.knownRepr (CS.regValue re)

This test harness is set up in such a way that it will succeed if test_mod2 returns a non-zero value. Since 1 % 2 should evaluate to 1, this ought to be the case. But despite this, this test fails:

$ clang -fno-stack-protector -nostdlib bug.c -o bug.exe
$ cabal build --write-ghc-environment-files=always macaw-x86-symbolic
$ runghc Bug.hs
Bug.hs: user error (Expected SimulationResult Unsat, but got SimulationResult Sat)

The strangeness doesn't end here. Let's make a small change to this program, where instead of computing x % 2, we compute x / 2:

int __attribute__((noinline)) mod2(int x) {
  // return x % 2;
  return x / 2;
}

This time, test_mod2 should return zero, since 1 / 2 should evaluate to 0. And yet, this test unexpectedly succeeds:

$ clang -fno-stack-protector -nostdlib bug.c -o bug.exe
$ runghc Bug.hs
Test passed

Both of these issues are related, since both the % and / C operators compile down to idiv instructions. For instance, here is the x86-64 code for the x % 2 version of the program:

$ objdump -d bug.exe

bug.exe:     file format elf64-x86-64


Disassembly of section .text:

0000000000001000 <mod2>:
    1000:	55                   	push   %rbp
    1001:	48 89 e5             	mov    %rsp,%rbp
    1004:	89 7d fc             	mov    %edi,-0x4(%rbp)
    1007:	8b 45 fc             	mov    -0x4(%rbp),%eax
    100a:	b9 02 00 00 00       	mov    $0x2,%ecx
    100f:	99                   	cltd   
    1010:	f7 f9                	idiv   %ecx
    1012:	89 d0                	mov    %edx,%eax
    1014:	5d                   	pop    %rbp
    1015:	c3                   	ret    
    1016:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
    101d:	00 00 00 

0000000000001020 <test_mod2>:
    1020:	55                   	push   %rbp
    1021:	48 89 e5             	mov    %rsp,%rbp
    1024:	bf 01 00 00 00       	mov    $0x1,%edi
    1029:	e8 d2 ff ff ff       	call   1000 <mod2>
    102e:	5d                   	pop    %rbp
    102f:	c3                   	ret    

0000000000001030 <_start>:
    1030:	55                   	push   %rbp
    1031:	48 89 e5             	mov    %rsp,%rbp
    1034:	e8 e7 ff ff ff       	call   1020 <test_mod2>
    1039:	5d                   	pop    %rbp
    103a:	c3                   	ret

The text was updated successfully, but these errors were encountered:

RyanGlScott · 2024-07-12T19:04:12Z

The same bug also affects unsigned division (using the div instruction):

// bug.c
#include <stdlib.h>

int __attribute__((noinline)) mod3(unsigned int x) {
  return x % 3;
}

int __attribute__((noinline)) test_mod3(void) {
  return mod3(1);
}

void _start(void) {
  test_mod3();
};

$ objdump -d bug.exe 

bug.exe:     file format elf64-x86-64


Disassembly of section .text:

0000000000001000 <mod3>:
    1000:	55                   	push   %rbp
    1001:	48 89 e5             	mov    %rsp,%rbp
    1004:	89 7d fc             	mov    %edi,-0x4(%rbp)
    1007:	8b 45 fc             	mov    -0x4(%rbp),%eax
    100a:	b9 03 00 00 00       	mov    $0x3,%ecx
    100f:	31 d2                	xor    %edx,%edx
    1011:	f7 f1                	div    %ecx
    1013:	89 d0                	mov    %edx,%eax
    1015:	5d                   	pop    %rbp
    1016:	c3                   	ret    
    1017:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
    101e:	00 00 

0000000000001020 <test_mod3>:
    1020:	55                   	push   %rbp
    1021:	48 89 e5             	mov    %rsp,%rbp
    1024:	bf 01 00 00 00       	mov    $0x1,%edi
    1029:	e8 d2 ff ff ff       	call   1000 <mod3>
    102e:	5d                   	pop    %rbp
    102f:	c3                   	ret    

0000000000001030 <_start>:
    1030:	55                   	push   %rbp
    1031:	48 89 e5             	mov    %rsp,%rbp
    1034:	e8 e7 ff ff ff       	call   1020 <test_mod3>
    1039:	5d                   	pop    %rbp
    103a:	c3                   	ret

$ runghc Bug.hs
Bug.hs: user error (Expected SimulationResult Unsat, but got SimulationResult Sat)

When converting a Macaw value with the Macaw type `TupleType [x_1, ..., x_n]` to Crucible, the resulting Crucible value will have the Crucible type `StructType (EmptyCtx ::> ToCrucibleType x_n ::> ... ::> ToCrucibleType x_1)`. (See `macawListToCrucible(M)` in `Data.Macaw.Symbolic.PersistentState` for where this is implemented.) Note that the order of the tuple's fields is reversed in the process of converting it to a Crucible struct. This is a convention that one must keep in mind when dealing with Macaw tuples at the Crucible level. As it turns out, the part of `macaw-x86-symbolic` reponsible for interpreting the semantics of the `idiv` instruction (for signed quotient/remainder) and the `div` instruction (for unsigned quotient/remainder) were _not_ respecting this convention. This is because the `macaw-x86-symbolic` semantics were returning a Crucible struct consisting of `Empty :> quotient :> remainder)`, but at the Macaw level, this was interpreted as the tuple `(remainder, quotient)`, which is the opposite of the intended order. This led to subtle bugs such as those observed in #393. The solution is straightforward: have the `macaw-x86-symbolic` semantics compute `Empty :> remainder :> quotient` instead. Somewhat counterintuitive, but it does work. Fixes #393.

RyanGlScott added bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution arch:x86 x86 issues labels Jul 12, 2024

RyanGlScott mentioned this issue Jul 12, 2024

macaw-x86-symbolic: Fix idiv/div semantics #394

Merged

RyanGlScott closed this as completed in #394 Jul 12, 2024

RyanGlScott closed this as completed in a6ff58f Jul 12, 2024

RyanGlScott mentioned this issue Jul 15, 2024

Unify the representations of TupleTypeRepr and StructRepr #396

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`macaw-x86-symbolic` simulates `idiv` instructions incorrectly #393

`macaw-x86-symbolic` simulates `idiv` instructions incorrectly #393

RyanGlScott commented Jul 12, 2024

RyanGlScott commented Jul 12, 2024

macaw-x86-symbolic simulates idiv instructions incorrectly #393

macaw-x86-symbolic simulates idiv instructions incorrectly #393

Comments

RyanGlScott commented Jul 12, 2024

RyanGlScott commented Jul 12, 2024

`macaw-x86-symbolic` simulates `idiv` instructions incorrectly #393

`macaw-x86-symbolic` simulates `idiv` instructions incorrectly #393