Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Capstone2llvmir] Update to Capstone V5 #1059

Merged
merged 83 commits into from
Dec 5, 2022

Conversation

owlxiao
Copy link
Contributor

@owlxiao owlxiao commented Jan 26, 2022

[deps][capstone][CMakeLists]

Add CAPSTONE_INSTALL opinion

Generate install target

[Capstone2LlvmIr][PowerPC]

Remove ppc_reg_cr_flags enum

In capstone V5, each bits of CR field already has native support

Fix translateClrlwi method

In capstone V5, rlwinm is equivalent to clrlwi

Fix TODO: PPC_INS_BDZLA

Add branch mnemonics incorporating conditions

    BLT*        BLE*        BEQ*        BGE*
    BGT*        BNL*        BNE*        BNG*
    BSO*        BNS*

For BUN* and BNU*, they are equivalent to BSO*/BNS*,
because they are read same CR field

Add PPC_REG_ZERO

The reprsentation of r0 when treated as the constant 0.

Add PPC_INS_CMP and PPC_INS_CMPL

Add PPC_INS_MFAMR

Fix translateCrSetClr method

In capstone V5, the crset/crclr operator will return detail field of CR,
so we should judge CR field instead of CR

Add isCrBitRegister()

judge whether it is a field of CR

Fix Crand Unit Test

[Capstone2LlvmIr][Arm]

Add ARM_INS_MOVS

Fix ARM_INS_MOVW Unit test

Fix ARM_INS_NOP Unit test

In capstone V5, nop is a hint instruction.

[Capstone2LlvmIr][Arm64]

Remove include AArch64BaseInfo.h

Fix Reigster name

Add following instructions support:

set flags:

ARM64_INS_ADDS, ARN64_INS_ANDS,
ARM64_INS_BICS, ARM_INS_SBCS  ,
ARM_INS_SUBS

Remove op.VESS

it overlaps with op.VAS

[Capstone2LlvmIr][X86]

Add X86_INS_FADDP support

Fix X86_INS_FCOMIP/FUCOMIP to X86_INS_FCOMPI/FUCOMPI

Delete X86_INS_PCOMMIT

The PCOMMIT instruction has been deprecated.

Fix X86_INS_UD2B to X86_INS_UD1

Remove following instructions

X86_INS_VCVTPD2DQX, X86_INS_VCVTPD2PSX, X86_INS_VCVTTPD2DQX

each pseudo instruction of:

    X86_INS_CMPSS
    X86_INS_CMPSD
    X86_INS_CMPPS
    X86_INS_CMPPD
    X86_INS_VCMPSS
    X86_INS_VCMPSD
    X86_INS_VCMPPS
    X86_INS_VCMPPD

Fix loadOpFloatingBinaryTop

when transalte "FXCH instruction, in the value of loadOpFloatingBinaryTop Function,
"top" is equivalent to idx, which causes the value to be written to top twice when exchanging data.

Fix translateFadd method

in capstone V5, FADDP = FADD. so we should judge it by read opcode

Fix X86_INS_FADD_d8 Unit test

@xkubov
Copy link
Contributor

xkubov commented Jan 26, 2022

Awesome! @PeterMatula or I will get to the review ASAP. In meanwhile, lets run TC tests.

@xkubov
Copy link
Contributor

xkubov commented Jan 26, 2022

In the meanwhile, can you please rebase the owlxiao:capstone-next onto origin/master and resolve the merge conflict?

Peter Kubov and others added 28 commits January 27, 2022 12:46
-[ARM]
    -Add ARM_INS_MOVS support
-[ARM64]
    -Remove vess.
        -It overlaps with ARM64_VAS
    -Fix A64SysReg_* into ARM64_SYSREG_*
-[PowerPC]
    -Fix PPC_REG_X2 into PPC_REG_XER
-[X86]
    -Remove X86_INS_FADDP
        -In capstone-next, faddp is actually fadd, both belong to
            "ID 15(fadd)"
- In test, "movw r0, #0xabcd" do not read any register
    and the result is 0xabcd not 0x1234abcd
- In arm, the NOP instruction is HINT instruction
- Also, in capstone, the cs_insn->id of nop is point to
    HINT(ID: 63)
- So, an error will be occurred when looking for a translate
    instruction method because it is points to nullptr
- RLWINM and clrlwi are same ID
- when transalte "FXCH instruction, in the value of loadOpFloatingBinaryTop Function,
    "top" is equal to idx, which causes the value to be written to top
    twice when exchanging data.
richardlford and others added 6 commits July 21, 2022 15:44
Capstone version 4.0.2 has a bug when disassembling a powerpc instruction
with a signed 16-bit immediate.
See capstone-engine/capstone#1746 and
capstone-engine/capstone#1746 (comment).

This change adds to the capstone patch to fix this problem.
As Capstone was updated, the fix in capstone-engine/capstone#968 took effect and the original RetDec fix is not needed - in fact, it caused problems.
This case is for x86 32 bit compiled with GCC. Its PLT entries are in
sections .plt.sec or .plt.got. An entry is of the form:

jmp *offset(%ebx)

When this code is encountered register %ebx has been loaded with the
address of the start of the Global Offset Table (.got) section.
This change handles that case.
…st#1090)

* Add ability to process PNG icons for perceptual hash calculation

* Use SCOPE_EXIT for deallocation
richardlford and others added 14 commits August 31, 2022 16:53
…ut headers

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions for with know
headers. We do not emit prototypes for functions with headers as that
would be redundant.  As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR will have added tests as well as changes needed for
existing tests.
Add printing of analysis time to retdec-fileinfo output
There are certain samples where page index might go beyond available
pages when trying to load them which will be prevented with this patch.
…y-check

Added sanity check for page index when loading pages from broken samples
@PeterMatula PeterMatula deleted the branch avast:capstone-update-v5 December 5, 2022 15:14
@PeterMatula PeterMatula closed this Dec 5, 2022
@PeterMatula PeterMatula reopened this Dec 5, 2022
@PeterMatula PeterMatula merged commit 23ecab3 into avast:capstone-update-v5 Dec 5, 2022
PeterMatula added a commit that referenced this pull request Dec 5, 2022
* Update Capstone to v4.0

* [Capstone-next] Update to capstone-next branch

* [Capstone-next] Update to Capstone-Next Branch
-[ARM]
    -Add ARM_INS_MOVS support
-[ARM64]
    -Remove vess.
        -It overlaps with ARM64_VAS
    -Fix A64SysReg_* into ARM64_SYSREG_*
-[PowerPC]
    -Fix PPC_REG_X2 into PPC_REG_XER
-[X86]
    -Remove X86_INS_FADDP
        -In capstone-next, faddp is actually fadd, both belong to
            "ID 15(fadd)"

* [tests][capstone2llvmir][arm] Fix MOVW Unit Test
- In test, "movw r0, #0xabcd" do not read any register
    and the result is 0xabcd not 0x1234abcd

* [tests][capstone2llvmir][arm] Fix Nop test
- In arm, the NOP instruction is HINT instruction
- Also, in capstone, the cs_insn->id of nop is point to
    HINT(ID: 63)
- So, an error will be occurred when looking for a translate
    instruction method because it is points to nullptr

* [Capstone2llvmir][arm64] Add ADDCS Support

* [capstone2llvmir][arm64] Add ADDS Support

* [capstone2llvmir][arm64] Add ANDS Support

* [capstone2llvmir][arm64] Add SUP Support

* [capstone2llvmir][arm64] Add BICS Support

* [capstonellvmir][PowerPC] Update Register Name

* [capstone2llvmir][PowerPC] Update Register Name

* [capstone2llvmir][PowerPC] Fix CMP Support

* [capstone2llvmir][PowerPC] Add CMPL Support

* [capstone2llvmir][PowerPC] Fix CMPL

* [capstone2llvmir][PowerPC] Add BLT Support

* [capstone2llvmir][PowerPC] Add  Branch mnemonics incorporating
conditions Suppport

* [capstone2llvmir][PowerPC] Fix RLWINM
- RLWINM and clrlwi are same ID

* [tests][capstone2llvmir][PowerPC] Fix Crand Tests

* [capstone2llvmir][PowerPC] Fix bdzla BUG

* [capstone2llvmir][PowerPC] Remove BDZLA TODO

* [capstone2llvmir][x86] Fix ud2b

* [capstone2llvmir][X86] Fix FADD/FADDP

* [capstone2llvmir][x86] Fix FADD/FADDP

* [capstone2llvmir][x86] Fix FXCH
- when transalte "FXCH instruction, in the value of loadOpFloatingBinaryTop Function,
    "top" is equal to idx, which causes the value to be written to top
    twice when exchanging data.

* clean code

* Update Capstone to v5.0

* [capstone2llvmir][x86][PowerPC] Clean code

* [capstone2llvmir][PowerPC] Clean code

* [capstone2llvmir][PowerPC] Remove BUN* and BNU*
-In CapstoneV5, they are both equivalent to BSO* and BNS*

* [capstone2llvmir][PowerPC] Fix rlwinm
- In capstone V5, rlwinm is equivalent to to clrlwi

* [capstone2llvmir][PowerPC] Fix BNL*

* [capstone2llvmir][PowerPC] Add PPC_REG_ZERO

* [capstone2llvmir][PowerPC] Add comment

* Fix merge conflict

* Update YARA to 4.2.X

* Add dll_name from export directory to output

* llvm/CMakeLists: Manually-specified variables were not used by the project.

The following variables were set in CMakeLists, however, they
were not used by the LLVM project build:

LLVM_USE_CRT_DEBUG
LLVM_USE_CRT_RELEASE

* CHANGELOG.md: add entries for #1060 #1061 PRs

* Fixed loading import directory that is modified by relocations

* Fixed comment

* Remove useless trailing whitespace

There is absolutely no reason for it being in the code.

* pelib: Fix a typo in a comment in PeLib::ImageLoader::Load()

* Add a CHANGELOG entry for #1063

* Move signing certificate to separate object

* Updated authenticode parser to the newest version

* Fix uninitialize free, use finer sanity checks in auth. parser

* Add a directory for RetDec-related publications

The list of publications has been originally placed on
https://retdec.com/publications/ (https://retdec.com/ has been redirected
to https://github.com/avast/retdec, and we wanted to keep the list somewhere).

* Fix the wording for an invalid max-memory error in scripts/retdec-unpacker.py

There are the following two reasons for the fix:
- The check only verifies whether the passed value is an integer.
- The parameter can be 0 (i.e. a non-negative integer). It does not have to a
  positive integer.

* Never try to limit memory on macOS

We can't limit memory on macOS. Before macOS 12
limitSystemMemoryOnPOSIX() does not actually do anything on macOS.
Anyway, it just succeed, since macOS 12 it returns error and retdec
can't start.

To be honest Apple can control memmory limit via so-called the ledger()
system call which is private. An old verison which was opened to
OpenSource (from 10.9-10.10?) using setrlimit() but at some point
setrlimit() was broken and not ledger(). Probably at macOS 12 the
setrlimit() was completley broken.

Because we haven't got any other choose just return true which haven't
change anything.

See: #379
Fixes: #1045

* Remove a redundant period from CHANGELOG

* utils: Improve the wording of a comment in getTotalSystemMemoryOnMacOS()

* Add a CHANGELOG entry for #1074 and #1045

* Update authenticode-parser, use-after-free, signedness issues

* Using multistage build for Dockerfile, reduces container size by ~1.5G

* Check for possible overflow when checking for segment overlaps. Fix incorrect range exception message

* Fix parameter and return types for dynamically called functions

Calls to dynamically-linked functions go through the procedure linkage
table (PLT).  RetDec turns a PLT entry into a function, say
malloc@plt, that appears to do nothing but call the external function,
say malloc (though the assembly code will do a jump rather than a
call). User code that logically wants to call malloc instead calls
malloc@plt (and sets up arguments as if calling malloc). The
malloc@plt code first jumps to the dynamic linker which modifies it so
that subsequent calls to malloc@plt will jump directly to malloc. We
say that malloc@plt wraps malloc.  The call to malloc in malloc@plt
will not have any arguments setup, so malloc will appear to have
no parameters or returns (unless that information is provided by
link-time-information, debug information, or name demangling), but it
needs to have the same parameter types and return type as
malloc@plt. The propagateWrapped methods copy the argument information
from the DataFlowEntry of the wrapping function to the wrapped
function. Then, when the calls to the wrapping function are inlined
(in connectWrappers), effectively the call to the wrapping function is
changed into a call to the wrapped function.

The motivation for this change is the programs that analyze the
output of RetDec (either the C code, or the LLVM code) want to
recognize library functions and treat them specially. This
change makes it so that the library function names are used
directly (rather than the plt version) and they are passed
their parameters correctly.

* Upgrade to Capstone release 4.0.2

* Add additional patch on capstone 4.0.2 for PPC Signed 16 bit immediates

Capstone version 4.0.2 has a bug when disassembling a powerpc instruction
with a signed 16-bit immediate.
See capstone-engine/capstone#1746 and
capstone-engine/capstone#1746 (comment).

This change adds to the capstone patch to fix this problem.

* Treat endbr32/endbr64 instructions as NOPs

* capstone2llvmir/powerpc: remove PPC_INS_BDZLA hack fix

As Capstone was updated, the fix in capstone-engine/capstone#968 took effect and the original RetDec fix is not needed - in fact, it caused problems.

* Handle Procedure Linkage calls for 32bit x86 from gcc

This case is for x86 32 bit compiled with GCC. Its PLT entries are in
sections .plt.sec or .plt.got. An entry is of the form:

jmp *offset(%ebx)

When this code is encountered register %ebx has been loaded with the
address of the start of the Global Offset Table (.got) section.
This change handles that case.

* Add ability to process PNG icons for perceptual hash calculation (#1090)

* Add ability to process PNG icons for perceptual hash calculation

* Use SCOPE_EXIT for deallocation

* In generated C, add prototypes for dynamically-linked functions without headers

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions for with know
headers. We do not emit prototypes for functions with headers as that
would be redundant.  As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR will have added tests as well as changes needed for
existing tests.

* Add printing of analysis time to retdec-fileinfo output

* Yara: inherits linker flags

* Use provided libtool via `CMAKE_LIBTOOL`

* Added missed `${RETDEC_INSTALL_BIN_DIR}` to `pat2yara`

* Added sanity check for page index when loading pages from broken samples

There are certain samples where page index might go beyond available
pages when trying to load them which will be prevented with this patch.

* Virtual Size overflow is now handler properly

* Fixed error code

* Updated yaramod

* Fix removeZeroSequences

* README.md: add "limited maintenance mode" note

Co-authored-by: Peter Kubov <[email protected]>
Co-authored-by: houndthe <[email protected]>
Co-authored-by: Peter Matula <[email protected]>
Co-authored-by: Ladislav Zezula <[email protected]>
Co-authored-by: Petr Zemek <[email protected]>
Co-authored-by: Marek Milkovič <[email protected]>
Co-authored-by: Kirill A. Korinsky <[email protected]>
Co-authored-by: me <me>
Co-authored-by: Richard L Ford <[email protected]>
Co-authored-by: 未赢 <[email protected]>
@PeterMatula
Copy link
Collaborator

Thanks @owlxiao once again, you did all the work 👍 only minor changes were required in the code and tests to get it fully working with Capstone v5.0-rc2.

Sorry for the huge delay on our side, we were going through a long period of uncertainty. But despite limited resource going forward, we will work on PRs, some improvements, and new releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants