Skip to content

Latest commit

 

History

History
38 lines (34 loc) · 1.3 KB

Readme.md

File metadata and controls

38 lines (34 loc) · 1.3 KB

ByteCue: One Step Further Than Decompilation: Bytecode Comment Generation

This is the source code and dataset for ByteCue. The dataset is saved in the datawash/data folder.

Quick start

If you want to train your own dataset, start with the step1, otherwise skip the step1.

Step1: data preprocess

  • please place the bytecode, cfg and comment files under data folder with the following names:
    -train_story.txt
    -train_summ.txt
    -train_cfg.txt
    -train_api_pair.txt
    -eval_story.txt
    -eval_summ.txt
    -eval_cfg.txt
    -eval_api_pair.txt
    -test_story.txt
    -test_summ.txt
    -test_cfg.txt
    -test_api_pair.txt
    each story and summary must be in a single line (see sample text given.)

  • Run the preprocess.py
    Command: python preprocess.py
    This will creates three tfrecord files under the datawash folder.

Step2: train the model

run the main.py
Command: python main.py
Configurations for the model can be changes from config.py file

Step3: generate comments and test your trained model

  • Firstly, generate comments for the test set
    run the generateCOMMENT.py
    Command: python generateCOMMENT.py
  • Then, evaluate the generated comments
    run the evaluation.py
    Command: python evaluation.py