This is the codebase for the paper "Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study", which is submitted to IEEE TIFS. The arxiv version of this work is publicly available soon via this link.
To facilitate related communities and encourage future studies, we provide an easy-to-use and unified codebase to implement three graph-based models, two medium-size BERT-based sequence models, and four LLMs to study their performance for the code vulnerability detection task. Our codebase is built on the top of some related codebases provided below (Awesome Helpful Resources).
Dataset | Venue | Type | Paper Link |
---|---|---|---|
Devign | NeurIPS | Graph | Link |
ReGVD | IEEE ICSE | Graph | Link |
GraphCodeBERT | ICLR | Graph | Link |
CodeBERT | EMNLP | Sequence | Link |
UniXcoder | ACL | Sequence | Link |
Llama-2-7B | Arxiv | Sequence | Link |
CodeLlama-7B | Arxiv | Sequence | Link |
Llama-3-8B | Arxiv | Sequence | Link |
Llama-3.1-8B | Arxiv | Sequence | Link |
We provide our converted datasets in our HuggingFace dataset repository.
At the current path, you can download the datasets by the following command, and remember to rename VulResource
to data
:
git clone https://huggingface.co/datasets/xuefen/VulResource
Original paper and resources are listed below.
Dataset | Venue | Paper Link |
---|---|---|
ReVeal | IEEE TSE | Link |
Devign | NeurIPS | Link |
Draper | IEEE ICMLA | Link |
BigVul | IEEE ICSE | Link |
DiverseVul | IEEE ICSE | Link |
We provided user-friendly shell scripts to simplify model training and evaluation. These scripts are located in the scripts/
directory, and their general functionalities are as follows:
scripts
├── finetune.sh # Fine-tune LLMs for the experiments on Section 6.2 and 6.3.
├── inference.sh # Evaluate LLMs fine-tuned using `finetune.sh`.
├── finetune_imbalance.sh # Fine-tune LLMs for the experiments on Section 6.4.
├── inference_imbalance.sh # Evaluate LLMs fine-tuned using `finetune_imbalance.sh`.
├── finetune_ablation.sh # Fine-tune LLMs for the experiments on Section 6.5.
├── inference_ablation.sh # Evaluate LLMs fine-tuned using `finetune_ablation.sh`.
├── train.sh # Train graph-based and medium-size sequence models for the experiments on Section 6.2.
├── test.sh # Evaluate models trained using `train.sh`.
├── train_imbalance.sh # Train graph-based and medium-size sequence models for the experiments on Section 6.4.
├── test_imbalance.sh # Evaluate models trained using `train_imbalance.sh`.
└── to_graph.sh # Convert data into graph format as input for Devign model.
Before using these scripts, you need to:
- Use the
cd
command to set the running directory to the root of this repository. - Place all data in the
data/
directory, following the directory structure and file names provided in our open-sourced HuggingFace repository. - Install all dependencies listed in the
requirements.txt
by running the commandpip install -r requirements.txt
.
The trained models and output log will be generated in the outputs/
directory.
To quickly get started, you can run the following examples:
# For the experiments on Section 6.2.
./scripts/finetune.sh reveal llama3.1 0-512 16 0;
./scripts/inference.sh reveal llama3.1 0-512 0;
./scripts/train.sh reveal ReGVD 0-512 0;
./scripts/test.sh reveal ReGVD 0-512 0;
# For the experiments on Section 6.3.
./scripts/finetune.sh mix llama3.1 128-256 32 0;
./scripts/inference.sh mix llama3.1 128-256 0;
./scripts/train.sh mix ReGVD 128-256 0;
./scripts/test.sh mix ReGVD 128-256 0;
# For the experiments on Section 6.4.
./scripts/finetune_imbalance.sh draper llama3.1 0·2 0;
./scripts/inference_imbalance.sh draper llama3.1 0·2 0;
./scripts/train_imbalance.sh draper CodeBERT 0·2 0;
./scripts/test_imbalance.sh draper CodeBERT 0·2 0;
# For the experiments on Section 6.5.
./scripts/finetune_ablation.sh reveal llama3.1 8 16 0;
./scripts/inference_ablation.sh reveal llama3.1 8 16 0;
You can modify the command-line arguments in the above examples to perform other experiments mentioned in the paper.
Specifically, the second parameter represents the dataset name, which corresponds to the folder name in the data/
directory.
You can customize a new dataset (assume it is named xxx
) by following the template of our open-sourced dataset on the HuggingFace repository. Store it according to the following file structure:
data
└── xxx
└── alpaca
├── xxx_0-123_test.json
├── xxx_0-123_train.json
├── xxx_0-123_validate.json
├── xxx_123-456_test.json
├── xxx_123-456_train.json
├── xxx_123-456_validate.json
└── ...
The third parameter specifies the model name, which has the presetting supported values:
- For scripts prefixed with
finetune
andinference
, the supported values are:llama-2
,codellama
,llama-3
, andllama-3.1
(all lowercase). - For scripts prefixed with
train
andtest
, the supported values are:Devign
,ReGVD
,GraphCodeBERT
,CodeBERT
, andUniXcoder
(case-sensitive).
Other parameters can refer to the usage within every script.
Unlike other models, the Devign model requires data to be converted into graph format before training and evaluation. To simplify this process, we provide the to_graph.sh
script.
Before converting, you need to download joern.zip
from this link, extract it, and store all the files in the joern/
directory. Make sure the current user has execute (x
) permissions for joern-parse
.
An example of training and evaluating the Devign model is as follows:
./scripts/to_graph.sh reveal 0-512;
./scripts/train.sh reveal Devign 0-512 0;
We implement our studied models by referencing the following resources or codebases, and we also recommend some useful related resources for further study.
Resource Name | Summary | Link |
---|---|---|
VulLLM | Referenced Codebase for Implementation | Link |
Devign | Referenced Codebase for Implementation | Link |
CodeBERT Family | Referenced Codebase for Implementation | Link |
ReGVD | Referenced Codebase for Implementation | Link |
Llama Family | Meta AI Open-source LLMs | Link |
Evaluate ChatGPT for CVD | Recommended Codebase | Link |
Awesome Code LLM | Recommended Paper List | Link |
Awesome LLM for Software Engineering | Recommended Paper List | Link |
Awesome LLM for Security | Recommended Paper List | Link |
We are very grateful that the authors of VulLLM, CodeLlama, Meta AI and other open-source efforts which make their codes or models publicly available so that we can carry out this experimental study on top of their hard works.
If you find this codebase useful in your research, please consider citing our work and previous great works as follows. By the way, collaboration and pull requests are always welcome! If you have any questions or suggestions, please feel free to contact us : )
@article{jiang2024investigating,
title={Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study},
author={Jiang, Xuefeng and Wu, Lvhua and Sun, Sheng and Li, Jia and Xue, Jingjing and Wang, Yuwei and Wu, Tingting and Liu, Min},
journal={arXiv preprint},
year={2024}
}
@article{feng2020codebert,
title={Codebert: A pre-trained model for programming and natural languages},
author={Feng, Zhangyin and Guo, Daya and Tang, Duyu and Duan, Nan and Feng, Xiaocheng and Gong, Ming and Shou, Linjun and Qin, Bing and Liu, Ting and Jiang, Daxin and others},
journal={arXiv preprint arXiv:2002.08155},
year={2020}
}
@article{du2024generalization,
title={Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning},
author={Du, Xiaohu and Wen, Ming and Zhu, Jiahao and Xie, Zifan and Ji, Bin and Liu, Huijun and Shi, Xuanhua and Jin, Hai},
journal={arXiv preprint arXiv:2406.03718},
year={2024}
}