Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macaw-x86-symbolic simulates idiv instructions incorrectly #393

Closed
RyanGlScott opened this issue Jul 12, 2024 · 1 comment · Fixed by #394
Closed

macaw-x86-symbolic simulates idiv instructions incorrectly #393

RyanGlScott opened this issue Jul 12, 2024 · 1 comment · Fixed by #394
Labels
arch:x86 x86 issues bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution

Comments

@RyanGlScott
Copy link
Contributor

Consider this C program, appropriately decorated for use in macaw's test suite:

// bug.c
#include <stdlib.h>

int __attribute__((noinline)) mod2(int x) {
  return x % 2;
}

int __attribute__((noinline)) test_mod2(void) {
  return mod2(1);
}

void _start(void) {
  test_mod2();
};

And consider this test harness (which was extracted from macaw-x86-symbolic's test suite):

-- Bug.hs
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
module Main (main) where

import           Control.Lens ((^.))
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import qualified Data.ElfEdit as Elf
import qualified Data.Foldable as F
import qualified Data.Map as Map
import           Data.Maybe ( mapMaybe )
import qualified Data.Parameterized.Classes as PC
import qualified Data.Parameterized.Nonce as PN
import           Data.Parameterized.Some ( Some(..) )
import           Data.Proxy ( Proxy(..) )

import qualified What4.Config as WC
import qualified What4.Expr as WE
import qualified What4.Expr.Builder as WEB
import qualified What4.FloatMode as WF
import qualified What4.Interface as WI
import qualified What4.ProblemFeatures as WPF
import qualified What4.Solver as WS

import qualified Lang.Crucible.Backend as CB
import qualified Lang.Crucible.Backend.Online as CBO
import qualified Lang.Crucible.Simulator as CS
import qualified Lang.Crucible.LLVM.MemModel as LLVM

import qualified Data.Macaw.Architecture.Info as MAI
import qualified Data.Macaw.Discovery as M
import qualified Data.Macaw.Symbolic as MS
import qualified Data.Macaw.Symbolic.Testing as MST
import qualified Data.Macaw.X86 as MX
import           Data.Macaw.X86.Symbolic ()

main :: IO ()
main = do
  -- These are pass/fail in that the assertions in the "pass" set are true (and
  -- the solver returns Unsat), while the assertions in the "fail" set are false
  -- (and the solver returns Sat).
  let passRes = MST.SimulationResult MST.Unsat
  mkSymExTest passRes MST.LazyMemModel "bug.exe"

hasTestPrefix :: Some (M.DiscoveryFunInfo arch) -> Maybe (BS8.ByteString, Some (M.DiscoveryFunInfo arch))
hasTestPrefix (Some dfi) = do
  bsname <- M.discoveredFunSymbol dfi
  if BS8.pack "test_" `BS8.isPrefixOf` bsname
    then return (bsname, Some dfi)
    else Nothing

mkSymExTest :: MST.SimulationResult -> MST.MemModelPreset -> FilePath -> IO ()
mkSymExTest expected mmPreset exePath = do
  bytes <- BS.readFile exePath
  case Elf.decodeElfHeaderInfo bytes of
    Left (_, msg) -> fail ("Error parsing ELF header from file '" ++ show exePath ++ "': " ++ msg)
    Right (Elf.SomeElf ehi) -> do
      case Elf.headerClass (Elf.header ehi) of
        Elf.ELFCLASS32 -> fail "32 bit x86 binaries are not supported"
        Elf.ELFCLASS64 -> do
          binariesInfo <- MST.runDiscovery ehi exePath MST.toAddrSymMap MX.x86_64_linux_info MX.x86_64PLTStubInfo
          let funInfos = Map.elems (MST.binaryDiscState (MST.mainBinaryInfo binariesInfo) ^. M.funInfo)
          let testEntryPoints = mapMaybe hasTestPrefix funInfos
          F.forM_ testEntryPoints $ \(name, Some dfi) -> do
            Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
            sym <- WEB.newExprBuilder WF.FloatRealRepr WE.EmptyExprBuilderState gen
            CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
              -- We are using the z3 backend to discharge proof obligations, so
              -- we need to add its options to the backend configuration
              let solver = WS.z3Adapter
              let backendConf = WI.getConfiguration sym
              WC.extendConfig (WS.solver_adapter_config_options solver) backendConf

              execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
              archVals <- case MS.archVals (Proxy @MX.X86_64) Nothing of
                            Just archVals -> pure archVals
                            Nothing -> error "mkSymExTest: impossible"
              let extract = x86ResultExtractor archVals
              let ?memOpts = LLVM.defaultMemOptions
              simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures MX.x86_64_linux_info archVals binariesInfo mmPreset extract dfi
              if expected == simRes
                then putStrLn "Test passed"
                else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes

-- | X86_64 functions with a single scalar return value return it in %rax
--
-- Since all test functions must return a value to assert as true, this is
-- straightforward to extract
x86ResultExtractor :: (CB.IsSymInterface sym) => MS.ArchVals MX.X86_64 -> MST.ResultExtractor sym MX.X86_64
x86ResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
  let re = MS.lookupReg archVals regs MX.RAX
  k PC.knownRepr (CS.regValue re)

This test harness is set up in such a way that it will succeed if test_mod2 returns a non-zero value. Since 1 % 2 should evaluate to 1, this ought to be the case. But despite this, this test fails:

$ clang -fno-stack-protector -nostdlib bug.c -o bug.exe
$ cabal build --write-ghc-environment-files=always macaw-x86-symbolic
$ runghc Bug.hs
Bug.hs: user error (Expected SimulationResult Unsat, but got SimulationResult Sat)

The strangeness doesn't end here. Let's make a small change to this program, where instead of computing x % 2, we compute x / 2:

int __attribute__((noinline)) mod2(int x) {
  // return x % 2;
  return x / 2;
}

This time, test_mod2 should return zero, since 1 / 2 should evaluate to 0. And yet, this test unexpectedly succeeds:

$ clang -fno-stack-protector -nostdlib bug.c -o bug.exe
$ runghc Bug.hs
Test passed

Both of these issues are related, since both the % and / C operators compile down to idiv instructions. For instance, here is the x86-64 code for the x % 2 version of the program:

$ objdump -d bug.exe

bug.exe:     file format elf64-x86-64


Disassembly of section .text:

0000000000001000 <mod2>:
    1000:	55                   	push   %rbp
    1001:	48 89 e5             	mov    %rsp,%rbp
    1004:	89 7d fc             	mov    %edi,-0x4(%rbp)
    1007:	8b 45 fc             	mov    -0x4(%rbp),%eax
    100a:	b9 02 00 00 00       	mov    $0x2,%ecx
    100f:	99                   	cltd   
    1010:	f7 f9                	idiv   %ecx
    1012:	89 d0                	mov    %edx,%eax
    1014:	5d                   	pop    %rbp
    1015:	c3                   	ret    
    1016:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
    101d:	00 00 00 

0000000000001020 <test_mod2>:
    1020:	55                   	push   %rbp
    1021:	48 89 e5             	mov    %rsp,%rbp
    1024:	bf 01 00 00 00       	mov    $0x1,%edi
    1029:	e8 d2 ff ff ff       	call   1000 <mod2>
    102e:	5d                   	pop    %rbp
    102f:	c3                   	ret    

0000000000001030 <_start>:
    1030:	55                   	push   %rbp
    1031:	48 89 e5             	mov    %rsp,%rbp
    1034:	e8 e7 ff ff ff       	call   1020 <test_mod2>
    1039:	5d                   	pop    %rbp
    103a:	c3                   	ret
@RyanGlScott RyanGlScott added bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution arch:x86 x86 issues labels Jul 12, 2024
@RyanGlScott
Copy link
Contributor Author

The same bug also affects unsigned division (using the div instruction):

// bug.c
#include <stdlib.h>

int __attribute__((noinline)) mod3(unsigned int x) {
  return x % 3;
}

int __attribute__((noinline)) test_mod3(void) {
  return mod3(1);
}

void _start(void) {
  test_mod3();
};
$ objdump -d bug.exe 

bug.exe:     file format elf64-x86-64


Disassembly of section .text:

0000000000001000 <mod3>:
    1000:	55                   	push   %rbp
    1001:	48 89 e5             	mov    %rsp,%rbp
    1004:	89 7d fc             	mov    %edi,-0x4(%rbp)
    1007:	8b 45 fc             	mov    -0x4(%rbp),%eax
    100a:	b9 03 00 00 00       	mov    $0x3,%ecx
    100f:	31 d2                	xor    %edx,%edx
    1011:	f7 f1                	div    %ecx
    1013:	89 d0                	mov    %edx,%eax
    1015:	5d                   	pop    %rbp
    1016:	c3                   	ret    
    1017:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
    101e:	00 00 

0000000000001020 <test_mod3>:
    1020:	55                   	push   %rbp
    1021:	48 89 e5             	mov    %rsp,%rbp
    1024:	bf 01 00 00 00       	mov    $0x1,%edi
    1029:	e8 d2 ff ff ff       	call   1000 <mod3>
    102e:	5d                   	pop    %rbp
    102f:	c3                   	ret    

0000000000001030 <_start>:
    1030:	55                   	push   %rbp
    1031:	48 89 e5             	mov    %rsp,%rbp
    1034:	e8 e7 ff ff ff       	call   1020 <test_mod3>
    1039:	5d                   	pop    %rbp
    103a:	c3                   	ret

$ runghc Bug.hs
Bug.hs: user error (Expected SimulationResult Unsat, but got SimulationResult Sat)

RyanGlScott added a commit that referenced this issue Jul 12, 2024
When converting a Macaw value with the Macaw type `TupleType [x_1, ..., x_n]`
to Crucible, the resulting Crucible value will have the Crucible type
`StructType (EmptyCtx ::> ToCrucibleType x_n ::> ... ::> ToCrucibleType x_1)`.
(See `macawListToCrucible(M)` in `Data.Macaw.Symbolic.PersistentState` for
where this is implemented.) Note that the order of the tuple's fields is
reversed in the process of converting it to a Crucible struct. This is a
convention that one must keep in mind when dealing with Macaw tuples at the
Crucible level.

As it turns out, the part of `macaw-x86-symbolic` reponsible for interpreting
the semantics of the `idiv` instruction (for signed quotient/remainder) and the
`div` instruction (for unsigned quotient/remainder) were _not_ respecting this
convention. This is because the `macaw-x86-symbolic` semantics were returning a
Crucible struct consisting of `Empty :> quotient :> remainder)`, but at the
Macaw level, this was interpreted as the tuple `(remainder, quotient)`, which
is the opposite of the intended order. This led to subtle bugs such as those
observed in #393.

The solution is straightforward: have the `macaw-x86-symbolic` semantics
compute `Empty :> remainder :> quotient` instead. Somewhat counterintuitive,
but it does work.

Fixes #393.
RyanGlScott added a commit that referenced this issue Jul 12, 2024
When converting a Macaw value with the Macaw type `TupleType [x_1, ..., x_n]`
to Crucible, the resulting Crucible value will have the Crucible type
`StructType (EmptyCtx ::> ToCrucibleType x_n ::> ... ::> ToCrucibleType x_1)`.
(See `macawListToCrucible(M)` in `Data.Macaw.Symbolic.PersistentState` for
where this is implemented.) Note that the order of the tuple's fields is
reversed in the process of converting it to a Crucible struct. This is a
convention that one must keep in mind when dealing with Macaw tuples at the
Crucible level.

As it turns out, the part of `macaw-x86-symbolic` reponsible for interpreting
the semantics of the `idiv` instruction (for signed quotient/remainder) and the
`div` instruction (for unsigned quotient/remainder) were _not_ respecting this
convention. This is because the `macaw-x86-symbolic` semantics were returning a
Crucible struct consisting of `Empty :> quotient :> remainder)`, but at the
Macaw level, this was interpreted as the tuple `(remainder, quotient)`, which
is the opposite of the intended order. This led to subtle bugs such as those
observed in #393.

The solution is straightforward: have the `macaw-x86-symbolic` semantics
compute `Empty :> remainder :> quotient` instead. Somewhat counterintuitive,
but it does work.

Fixes #393.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:x86 x86 issues bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant