Skip to content

[CCS'24] An LLM-based, fully automated fuzzing tool for option combination testing.

License

Notifications You must be signed in to change notification settings

NASP-THU/ProphetFuzz

Repository files navigation

ProphetFuzz

ProphetFuzz thumbnail

The implementation of the paper titled "ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model"

ProphetFuzz is an LLM- based, fully automated fuzzing tool for option combination testing. ProphetFuzz can predict and conduct fuzzing on high-risk option combinations 1 with only documentation, and the entire process operates without manual intervention.

For more details, please refer to our paper from ACM CCS'24.

Due to page limitations, the Appendix of the paper could not be included within the main text. Please refer to Appendix.

Structure

.
├── Dockerfile
├── README.md
├── assets
│   ├──  dataset
│   │   ├── groundtruth_for_20_programs.json
│   │   └── precision.json
│   └── images
├── fuzzing_handler
│   ├── cmd_fixer.py
│   ├── code_checker.py
│   ├── config.json
│   ├── run_cmin.py
│   ├── run_fuzzing.sh
│   └── utils
│       ├── analysis_util.py
│       ├── code_utils.py
│       └── execution_util.py
├── llm_interface
│   ├── assemble.py
│   ├── config
│   │   └── .env
│   ├── constraint.py
│   ├── few-shot
│   │   ├── manpage_htmldoc.json
│   │   ├── manpage_jbig2.json
│   │   ├── manpage_jhead.json
│   │   ├── manpage_makeswf.json
│   │   ├── manpage_mp4box.json
│   │   ├── manpage_opj_compress.json
│   │   ├── manpage_pdf2swf.json
│   │   └── manpage_yasm.json
│   ├── few-shot_generate.py
│   ├── input
│   ├── output
│   ├── predict.py
│   ├── restruct_manpage.py
│   └── utils
│       ├── gpt_utils.py
│       └── opt_utils.py
├── manpage_parser
│   ├── input
│   ├── output
│   ├── parser.py
│   └── utils
│       └── groff_utils.py
└── run_all_in_one.sh
  1. manpage_parser: Scripts for parsing documentation
  2. llm_interface: Scripts for extracting constraints, predicting high-risk option combinations, and assembling commands.
  3. fuzzing_handler: Scripts for preparing and conducting fuzzing.
  4. assets/dataset: Dataset for eveluating constraint extraction module.
  5. run_all_in_one.sh: Scripts for completing everything with one script.
  6. Dockerfile: Building our experiment environment (Tested on Ubuntu 20.04)

The implementations for various components of ProphetFuzz can be found in the following functions,

Section Component File Function
3.2 Constraint Extraction llm_interface/constraint.py extractRelationships
3.2 Self Check llm_interface/constraint.py checkRelationships
3.3 AutoCoT llm_interface/few-shot_generate.py generatePrompt
3.3 High-Risk Combination Prediction llm_interface/predict.py predictCombinations
3.4 Command Assembly llm_interface/assembly.py generateCommands
3.5 File Generation fuzzing_handler/generate_combination.py main
3.5 Corpus Minimization fuzzing_handler/run_cmin.py runCMinCommands
3.5 Fuzzing fuzzing_handler/run_fuzzing.sh runFuzzing

Usage Example

Here's the English translation:

  1. Using Docker to Configure the Running Environment

    • If you only want to complete the part that interacts with the LLM, you can directly use our pre-installed image (4GB):
    docker run -it 4ugustus/prophetfuzz_base bash
    
    • If you want to complete the entire process, including seed generation, command repair, and fuzzing, please build the full image based on the pre-installed image:
    docker build -t prophetfuzz:latest .
    docker run -it --privileged=true prophetfuzz bash
    # 'privileged' is used for setting up the fuzzing environment
    
  2. Set Up Your API Key: Set your OpenAI API key in the llm_interface/config/.env file:

    OPENAI_API_KEY="[Input Your API Key Here]"
  3. Run the Script: Execute the script to start the automated fuzzing process:

    bash run_all_in_one.sh bison

    Note: If you are not within our Docker environment, you might need to manually install dependencies and adjust the fuzzing_handler/config.json file to specify the path to the program under test.

    If you prefer to start fuzzing manually, use the following command:

    fuzzer/afl-fuzz -i fuzzing_handler/input/bison -o fuzzing_handler/output/bison_prophet_1 -m none -K fuzzing_handler/argvs/argvs_bison.txt -- path/to/bison/bin/bison @@

CVEs Assigned

We employ ProphetFuzz to perform persistent fuzzing on the latest versions of the programs in our dataset. To date, ProphetFuzz has uncovered 140 zero-day or half-day vulnerabilities, 93 of which have been confirmed by the developers, earning 22 CVE numbers.

CVE Program Type
CVE-2024-3248 xpdf stack-buffer-overflow
CVE-2024-4853 editcap heap-buffer-overflow
CVE-2024-4855 editcap bad free
CVE-2024-31743 ffmpeg segmentation violation
CVE-2024-31744 jasper assertion failure
CVE-2024-31745 dwarfdump use-after-free
CVE-2024-31746 objdump heap-buffer-overflow
CVE-2024-32154 ffmpeg segmentation violation
CVE-2024-32157 mupdf segmentation violation
CVE-2024-32158 mupdf negative-size-param
CVE-2024-34960 ffmpeg floating point exception
CVE-2024-34961 pspp segmentation violation
CVE-2024-34962 pspp segmentation violation
CVE-2024-34963 pspp assertion failure
CVE-2024-34965 pspp assertion failure
CVE-2024-34966 pspp assertion failure
CVE-2024-34967 pspp assertion failure
CVE-2024-34968 pspp assertion failure
CVE-2024-34969 pspp segmentation violation
CVE-2024-34971 pspp segmentation violation
CVE-2024-34972 pspp assertion failure
CVE-2024-35316 ffmpeg segmentation violation

Credit

Thanks to Dawei Wang (@4ugustus) and Geng Zhou (@Arbusz) for their valuable contributions to this project.

Citing this paper

In case you would like to cite ProphetFuzz, you may use the following BibTex entry:

@inproceedings {wang2024prophet,
  title = {ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model},
  author = {Wang, Dawei and Zhou, Geng and Chen, Li and Li, Dan and Miao, Yukai},
  booktitle = {Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security},
  publisher = {Association for Computing Machinery},
  address = {Salt Lake City, UT, USA},
  pages = {735–749},
  year = {2024}
}

About

[CCS'24] An LLM-based, fully automated fuzzing tool for option combination testing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published