We provide the trained model for testing or the necessary scripts and data for training.
-
The small dataset is in the folder "dataset/Small"
-
The large dataset is in the folder "dataset/large"
We have also uploaded the dataset to google drive. You can download it here
- python 3.7
- numpy==1.24.3
- pandas==2.0.1
- scikit_learn==1.2.2
- torch==2.0.0+cu117
- transformers==4.28.1
- You can download the model directly through this link for testing, or you can use the data given above to train and test yourself.
-
First of all, please modify the
code/configs.py
, this file has some parameters needed to train our model. -
After modifying the parameters in the
configs.py
for the corresponding RQ, you can run thetrain.py
ortest.py
to reproduce the corresponding parameters.- Training
python train.py
- Predcition
python test.py
-
Note that you first need to modify the storage path of your model, which is the
self.model_save_path
- We integrated all the RQs in the training script, and just changed some parameters for different experiments, listed as follows.
- We also provide experimental results in our paper, which can be downloaded using the link. Because the model data is too large, we do not give the model results of all experiments, but only the training model of the first data set in each cross validation.
Change the path of the dataset self.data_train_path
to the corresponding dataset
For APPT_pre-training, please set self.no_pretrain
to True
For APPT_fine-tuneing, please set self.freeze_bert
to True
For APPT_LSTM, please, please set self.no_lstm
to True
Replace self.splicingMethod
with cat
, add
, sub
, mul
, mix
according to the category
Replace self.model_path
with 'bert-base-uncased', 'microsoft/codebert-base', 'microsoft/graphcodebert-base' according to the category
Set the self.run_rq3
to True
and then align with RQ1