Skip to content

Latest commit

 

History

History
54 lines (47 loc) · 1.84 KB

README.md

File metadata and controls

54 lines (47 loc) · 1.84 KB

ByteCue: One Step Further Than Decompilation: Bytecode Comment Generation

This is the source code and dataset for ByteCue.

Quick start

If you want to train your own dataset, start with the step1, otherwise skip the step1.

Step1: data preprocess

  • please place the bytecode, cfg and comment files under data folder with the following names:
    -train_story.txt
    -train_summ.txt
    -train_cfg.txt
    -train_api_pair.txt
    -eval_story.txt
    -eval_summ.txt
    -eval_cfg.txt
    -eval_api_pair.txt
    -test_story.txt
    -test_summ.txt
    -test_cfg.txt
    -test_api_pair.txt
    each story and summary must be in a single line (see sample text given.)

  • Run the preprocess.py
    Command: python preprocess.py
    This will creates three tfrecord files under the datawash folder.

Step2: train the model

run the main.py
Command: python main.py
Configurations for the model can be changes from config.py file

Step3: generate comments and test your trained model

  • Firstly, generate comments for the test set
    run the generateCOMMENT.py
    Command: python generateCOMMENT.py
  • Then, evaluate the generated comments
    run the evaluation.py
    Command: python evaluation.py

As the limitation of LFS, the dataset can be downloaded from https://drive.google.com/drive/folders/1z0xh0KOFB8V-9LQmE0BTJyXkUU_t3kYD?usp=sharing. Unzip the downloaded .zip file, which contains four folders ('datawash' ,'scripts', 'texar_repos', 'venv, 'pretrained_model'), then move these four folders to the ByteCue root directory.

|--scripts
|  |--build_data_with_cfg.py
|  |--drawCFG.py
|  |--prepare_train_data.py
|--texar_repo
|--Bytecue.py
|--config.py
|--evaluation.py
|--generateCOMMENT.py
|--main.py
|--preprocess.py