Synthetic Vulnerable Smart Contracts Dataset Generator

This repository provides tools to generate a synthetic dataset of Solidity smart contracts with injected vulnerabilities. It uses large language models (LLMs) to inject vulnerabilities like reentrancy into existing contracts or generate new ones from scratch. Additionally, it includes functionality to fetch verified contracts from Etherscan, run static analysis tools on the contracts, and analyze vulnerabilities.

Features

Etherscan-based generation: Modifies real-world contracts sourced from Etherscan.
Scratch-based generation: Generates entirely new contracts based on user-defined scenarios.
Verified contract fetching: Fetch verified smart contracts from DeFi protocols on Etherscan.
Customizable vulnerabilities: Inject specific vulnerabilities, such as reentrancy or front-running.
Scenarios: Define vulnerabilities with detailed examples for injection.
Static analysis: Use SmartBugs analyzers for vulnerability detection in contracts.
Scalable generation: Generate multiple contracts per scenario with ease.

Installation

Prerequisites

Python 3.8+: Ensure you have Python installed.
Dependencies: Install required Python libraries using:
```
pip install -r requirements.txt
```
Environment Variables: Populate your .env file with the following keys:
- ETHERSCAN_API_KEY: Your Etherscan API key.
- OPENAI_API_KEY: API key for OpenAI models.
- OPENAI_MODEL: Specify the LLM model to use.

Usage

Fetch Verified Contracts from Etherscan

The script fetch_contracts.py fetches verified smart contracts from popular DeFi protocols using the Etherscan API.

Example Usage

python fetch_contracts.py

Key Functionalities

Fetch Protocol Addresses: Retrieves smart contract addresses for protocols like Uniswap, Aave, and Compound.
Fetch Verified Contracts: Downloads and saves verified contracts from Etherscan for dataset augmentation.
Customization: Modify protocols or the target count of contracts by editing fetch_contracts.py.

Generate a Vulnerable Dataset

The main script main.py supports flexible generation options based on fetched contracts or from scratch.

Command-line Options

Argument	Type	Default	Description
`--generator_type`	`str`	`etherscan`	Select dataset generator: `etherscan` (modify existing contracts) or `scratch`.
`--vulnerability_type`	`str`	`reentrancy`	Type of vulnerability to inject (e.g., `reentrancy`, `front-running`).
`--contracts_dir`	`str`	`solidity_contracts`	Directory containing Solidity contracts (used with `etherscan` generator).
`--scenarios_dir`	`str`	`kb`	Directory containing JSON files defining generation scenarios.
`--num_contracts_per_scenario`	`int`	`5`	Number of contracts to generate per scenario.

Example Commands

Modify Existing Contracts

python main.py --generator_type etherscan --contracts_dir ./contracts --vulnerability_type reentrancy

Generate Contracts from Scratch

python main.py --generator_type scratch --scenarios_dir ./scenarios --vulnerability_type reentrancy

Customize Output

Generate 10 contracts per scenario:

python main.py --num_contracts_per_scenario 10

Static Analysis with SmartBugs

The script analyze_contracts.py integrates SmartBugs analyzers for vulnerability detection in Solidity contracts.

Command-line Options

Argument	Type	Default	Description
`--contracts-folder`	`str`	`dataset/etherscan150/runtime`	Path to the folder containing smart contracts to analyze.
`--output-folder`	`str`	`logs`	Path to the folder where analysis results will be saved.
`--analyzers`	`list`	`["ethor"]`	List of analyzers to use (e.g., `mythril`, `slither`).
`--timeout`	`int`	`300`	Timeout for each analysis in seconds.
`--processes`	`int`	`2`	Number of parallel processes to use for analysis.
`--mem-limit`	`str`	`4g`	Memory limit for each analysis process (e.g., `4g` for 4 GB).
`--cpu-quota`	`int`	`None`	Optional CPU quota for each process (percentage of CPU usage).
`--output-format`	`str`	`json`	Format of the analysis results (`json` or `sarif`).
`--runtime`	`bool`	`False`	Analyze runtime bytecode instead of Solidity source files. Overrides global mode.

Example Commands

Run SmartBugs on Runtime Bytecode

python analyze_contracts.py --runtime

Analyze Contracts with Custom Settings

python analyze_contracts.py --contracts-folder ./contracts --analyzers mythril slither --timeout 600 --processes 4

Scenarios

Scenarios are defined in JSON format in the kb/ directory. Each scenario describes:

Name: A short name for the vulnerability.
Scenario: A detailed explanation of the vulnerability and its exploit.
Example: Code snippet illustrating the vulnerability.
Issue: A summary of the flaw being exploited.

Example Scenario

{
    "name": "Cross-Function Reentrancy",
    "scenario": "An attacker reenters a different function than the one initially called, manipulating shared state variables.",
    "example": "function deposit() public payable {\n    balances[msg.sender] += msg.value;\n}\n\nfunction transfer(address to, uint256 amount) public {\n    require(balances[msg.sender] >= amount, \"Insufficient balance\");\n    balances[to] += amount;\n    balances[msg.sender] -= amount;\n}\n\nfunction withdrawAll() public {\n    uint256 amount = balances[msg.sender];\n    require(amount > 0, \"No funds\");\n    (bool success, ) = msg.sender.call{value: amount}(\"\");\n    require(success, \"Transfer failed\");\n    balances[msg.sender] = 0;\n}",
    "issue": "During `withdrawAll`, an attacker can call `deposit` or `transfer`, manipulating balances before the original function completes."
}

Development

Adding a New Vulnerability

To inject a new vulnerability type:

Extend the base functionality in EtherscanDatasetGenerator or ScratchDatasetGenerator.
Define the new injection logic.
Update the --vulnerability_type argument to include the new type.

Adding a New Static Analyzer

Update the SmartBugsRunner class to include the new analyzer.
Test compatibility with your contracts.
Add the analyzer to the --analyzers argument.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
kb		kb
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
kb.md		kb.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Vulnerable Smart Contracts Dataset Generator

Features

Installation

Prerequisites

Usage

Fetch Verified Contracts from Etherscan

Example Usage

Key Functionalities

Generate a Vulnerable Dataset

Command-line Options

Example Commands

Modify Existing Contracts

Generate Contracts from Scratch

Customize Output

Static Analysis with SmartBugs

Command-line Options

Example Commands

Run SmartBugs on Runtime Bytecode

Analyze Contracts with Custom Settings

Scenarios

Example Scenario

Development

Adding a New Vulnerability

Adding a New Static Analyzer

License

About

Releases

Packages

Languages

License

matteo-rizzo/smart-contracts-dataset-generation

Folders and files

Latest commit

History

Repository files navigation

Synthetic Vulnerable Smart Contracts Dataset Generator

Features

Installation

Prerequisites

Usage

Fetch Verified Contracts from Etherscan

Example Usage

Key Functionalities

Generate a Vulnerable Dataset

Command-line Options

Example Commands

Modify Existing Contracts

Generate Contracts from Scratch

Customize Output

Static Analysis with SmartBugs

Command-line Options

Example Commands

Run SmartBugs on Runtime Bytecode

Analyze Contracts with Custom Settings

Scenarios

Example Scenario

Development

Adding a New Vulnerability

Adding a New Static Analyzer

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages