InnerEye

This repository implements InnerEye, LSTM-based cross-platform binary code embedding generating tool that appears in the following paper.

@inproceedings{zuo2019neural,
title={Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs},
author={Zuo, Fei and Li, Xiaopeng and Young, Patrick and Luo,Lannan and Zeng,Qiang and Zhang, Zhexin},
booktitle={Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS)},
year={2019} }

The main purpose of this implementation is for providing a baseline for cross-platform binary code embedding research and the experiment results appeared in Improving Cross-Platform Binary Analysis using Representation Learning via Graph Alignment.

Getting Started

Our implementation is mostly based on the official implementation of the author of the paper while we apply the model to our cross-platform datasets that cover a broad range of software diciplines; SQLite3 (database), OpenSSL (network), cURL (file transfer), Httpd (webserver), libcrypto (crypto library), glibc (standard library). The data is preprocessed following the scheme described in the original paper and stored in data directory that is structured in the same way with XBA. Other components are structured as follows.

.
├── README
├── Pipfile                 # Manages a Python virtualenv.
├── Pipfile.lock            # Manages a Python virtualenv (Do not touch).
├── extract.py             # 
├── train.py             #  
├── utils.py             #  
├── validation.py             #  
├── data             # 
├── embeddings             # 
├── weights             #

Install

Prerequisites

Python 3.8 or above version is required. To install python dependencies, you need to install pipenv first.

$ pip3 install pipenv

Use pipenv shell

Install dependencies

$ pipenv install

Activate pipenv shell

$ pipenv shell

Use your own python virtual environment

Extract requirements.txt

$ pipenv lock -r > requirements.txt

Install dependencies

$ pip install -r requirements.txt

How to run

A several desired sequences of executable are defined in the Makefile.

Training Instruction2vec (i2v) embeddings and Siamese-LSTM from data in /revos/data/done/${programs}/innereye.csv

$ pipenv run -- python train.py --targets={programs}

Test the trained model

$ pipenv run -- python validation.py

Extract basic block embeddings using a model trained on {programs}

$ pipenv run -- python extract.py --targets={programs}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InnerEye

Getting Started

Install

Prerequisites

Use pipenv shell

Use your own python virtual environment

How to run

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/done		data/done
embeddings		embeddings
weights		weights
.gitignore		.gitignore
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
extract.py		extract.py
overview.png		overview.png
train.py		train.py
utils.py		utils.py
validation.py		validation.py

posgnu/innereye

Folders and files

Latest commit

History

Repository files navigation

InnerEye

Getting Started

Install

Prerequisites

Use pipenv shell

Use your own python virtual environment

How to run

About

Topics

Resources

Stars

Watchers

Forks

Languages