This is the source code and dataset for ByteCue.
If you want to train your own dataset, start with the step1, otherwise skip the step1.
please place the bytecode, cfg and comment files under data folder with the following names:
-train_story.txt
-train_summ.txt
-train_cfg.txt
-train_api_pair.txt
-eval_story.txt
-eval_summ.txt
-eval_cfg.txt
-eval_api_pair.txt
-test_story.txt
-test_summ.txt
-test_cfg.txt
-test_api_pair.txt
each story and summary must be in a single line (see sample text given.)Run the preprocess.py
Command:python preprocess.py
This will creates three tfrecord files under the datawash folder.
run the main.py
Command:python main.py
Configurations for the model can be changes from config.py file
- Firstly, generate comments for the test set
run the generateCOMMENT.py
Command:python generateCOMMENT.py
- Then, evaluate the generated comments
run the evaluation.py
Command:python evaluation.py
As the limitation of LFS, the dataset can be downloaded from https://drive.google.com/drive/folders/1z0xh0KOFB8V-9LQmE0BTJyXkUU_t3kYD?usp=sharing. Unzip the downloaded .zip file, which contains four folders ('datawash' ,'scripts', 'texar_repos', 'venv, 'pretrained_model'), then move these four folders to the ByteCue root directory.
|--scripts
| |--build_data_with_cfg.py
| |--drawCFG.py
| |--prepare_train_data.py
|--texar_repo
|--Bytecue.py
|--config.py
|--evaluation.py
|--generateCOMMENT.py
|--main.py
|--preprocess.py