The implementation of the paper titled "ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model"
ProphetFuzz is an LLM- based, fully automated fuzzing tool for option combination testing. ProphetFuzz can predict and conduct fuzzing on high-risk option combinations 1 with only documentation, and the entire process operates without manual intervention.
For more details, please refer to our paper from ACM CCS'24.
Due to page limitations, the Appendix of the paper could not be included within the main text. Please refer to Appendix.
.
├── Dockerfile
├── README.md
├── assets
│ ├── dataset
│ │ ├── groundtruth_for_20_programs.json
│ │ └── precision.json
│ └── images
├── fuzzing_handler
│ ├── cmd_fixer.py
│ ├── code_checker.py
│ ├── config.json
│ ├── run_cmin.py
│ ├── run_fuzzing.sh
│ └── utils
│ ├── analysis_util.py
│ ├── code_utils.py
│ └── execution_util.py
├── llm_interface
│ ├── assemble.py
│ ├── config
│ │ └── .env
│ ├── constraint.py
│ ├── few-shot
│ │ ├── manpage_htmldoc.json
│ │ ├── manpage_jbig2.json
│ │ ├── manpage_jhead.json
│ │ ├── manpage_makeswf.json
│ │ ├── manpage_mp4box.json
│ │ ├── manpage_opj_compress.json
│ │ ├── manpage_pdf2swf.json
│ │ └── manpage_yasm.json
│ ├── few-shot_generate.py
│ ├── input
│ ├── output
│ ├── predict.py
│ ├── restruct_manpage.py
│ └── utils
│ ├── gpt_utils.py
│ └── opt_utils.py
├── manpage_parser
│ ├── input
│ ├── output
│ ├── parser.py
│ └── utils
│ └── groff_utils.py
└── run_all_in_one.sh
- manpage_parser: Scripts for parsing documentation
- llm_interface: Scripts for extracting constraints, predicting high-risk option combinations, and assembling commands.
- fuzzing_handler: Scripts for preparing and conducting fuzzing.
- assets/dataset: Dataset for eveluating constraint extraction module.
- run_all_in_one.sh: Scripts for completing everything with one script.
- Dockerfile: Building our experiment environment (Tested on Ubuntu 20.04)
The implementations for various components of ProphetFuzz can be found in the following functions,
Section | Component | File | Function |
---|---|---|---|
3.2 | Constraint Extraction | llm_interface/constraint.py | extractRelationships |
3.2 | Self Check | llm_interface/constraint.py | checkRelationships |
3.3 | AutoCoT | llm_interface/few-shot_generate.py | generatePrompt |
3.3 | High-Risk Combination Prediction | llm_interface/predict.py | predictCombinations |
3.4 | Command Assembly | llm_interface/assembly.py | generateCommands |
3.5 | File Generation | fuzzing_handler/generate_combination.py | main |
3.5 | Corpus Minimization | fuzzing_handler/run_cmin.py | runCMinCommands |
3.5 | Fuzzing | fuzzing_handler/run_fuzzing.sh | runFuzzing |
Here's the English translation:
-
Using Docker to Configure the Running Environment
- If you only want to complete the part that interacts with the LLM, you can directly use our pre-installed image (4GB):
docker run -it 4ugustus/prophetfuzz_base bash
- If you want to complete the entire process, including seed generation, command repair, and fuzzing, please build the full image based on the pre-installed image:
docker build -t prophetfuzz:latest . docker run -it --privileged=true prophetfuzz bash # 'privileged' is used for setting up the fuzzing environment
-
Set Up Your API Key: Set your OpenAI API key in the
llm_interface/config/.env
file:OPENAI_API_KEY="[Input Your API Key Here]"
-
Run the Script: Execute the script to start the automated fuzzing process:
bash run_all_in_one.sh bison
Note: If you are not within our Docker environment, you might need to manually install dependencies and adjust the
fuzzing_handler/config.json
file to specify the path to the program under test.If you prefer to start fuzzing manually, use the following command:
fuzzer/afl-fuzz -i fuzzing_handler/input/bison -o fuzzing_handler/output/bison_prophet_1 -m none -K fuzzing_handler/argvs/argvs_bison.txt -- path/to/bison/bin/bison @@
We employ ProphetFuzz to perform persistent fuzzing on the latest versions of the programs in our dataset. To date, ProphetFuzz has uncovered 140 zero-day or half-day vulnerabilities, 93 of which have been confirmed by the developers, earning 22 CVE numbers.
CVE | Program | Type |
---|---|---|
CVE-2024-3248 | xpdf | stack-buffer-overflow |
CVE-2024-4853 | editcap | heap-buffer-overflow |
CVE-2024-4855 | editcap | bad free |
CVE-2024-31743 | ffmpeg | segmentation violation |
CVE-2024-31744 | jasper | assertion failure |
CVE-2024-31745 | dwarfdump | use-after-free |
CVE-2024-31746 | objdump | heap-buffer-overflow |
CVE-2024-32154 | ffmpeg | segmentation violation |
CVE-2024-32157 | mupdf | segmentation violation |
CVE-2024-32158 | mupdf | negative-size-param |
CVE-2024-34960 | ffmpeg | floating point exception |
CVE-2024-34961 | pspp | segmentation violation |
CVE-2024-34962 | pspp | segmentation violation |
CVE-2024-34963 | pspp | assertion failure |
CVE-2024-34965 | pspp | assertion failure |
CVE-2024-34966 | pspp | assertion failure |
CVE-2024-34967 | pspp | assertion failure |
CVE-2024-34968 | pspp | assertion failure |
CVE-2024-34969 | pspp | segmentation violation |
CVE-2024-34971 | pspp | segmentation violation |
CVE-2024-34972 | pspp | assertion failure |
CVE-2024-35316 | ffmpeg | segmentation violation |
Thanks to Dawei Wang (@4ugustus) and Geng Zhou (@Arbusz) for their valuable contributions to this project.
In case you would like to cite ProphetFuzz, you may use the following BibTex entry:
@inproceedings {wang2024prophet,
title = {ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model},
author = {Wang, Dawei and Zhou, Geng and Chen, Li and Li, Dan and Miao, Yukai},
booktitle = {Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security},
publisher = {Association for Computing Machinery},
address = {Salt Lake City, UT, USA},
pages = {735–749},
year = {2024}
}