This repository provides tools to generate a synthetic dataset of Solidity smart contracts with injected vulnerabilities. It uses large language models (LLMs) to inject vulnerabilities like reentrancy into existing contracts or generate new ones from scratch. Additionally, it includes functionality to fetch verified contracts from Etherscan, run static analysis tools on the contracts, and analyze vulnerabilities.
- Etherscan-based generation: Modifies real-world contracts sourced from Etherscan.
- Scratch-based generation: Generates entirely new contracts based on user-defined scenarios.
- Verified contract fetching: Fetch verified smart contracts from DeFi protocols on Etherscan.
- Customizable vulnerabilities: Inject specific vulnerabilities, such as reentrancy or front-running.
- Scenarios: Define vulnerabilities with detailed examples for injection.
- Static analysis: Use
SmartBugs
analyzers for vulnerability detection in contracts. - Scalable generation: Generate multiple contracts per scenario with ease.
-
Python 3.8+: Ensure you have Python installed.
-
Dependencies: Install required Python libraries using:
pip install -r requirements.txt
-
Environment Variables: Populate your
.env
file with the following keys:ETHERSCAN_API_KEY
: Your Etherscan API key.OPENAI_API_KEY
: API key for OpenAI models.OPENAI_MODEL
: Specify the LLM model to use.
The script fetch_contracts.py
fetches verified smart contracts from popular DeFi protocols using the Etherscan API.
python fetch_contracts.py
- Fetch Protocol Addresses: Retrieves smart contract addresses for protocols like Uniswap, Aave, and Compound.
- Fetch Verified Contracts: Downloads and saves verified contracts from Etherscan for dataset augmentation.
- Customization: Modify protocols or the target count of contracts by editing
fetch_contracts.py
.
The main script main.py
supports flexible generation options based on fetched contracts or from scratch.
Argument | Type | Default | Description |
---|---|---|---|
--generator_type |
str |
etherscan |
Select dataset generator: etherscan (modify existing contracts) or scratch . |
--vulnerability_type |
str |
reentrancy |
Type of vulnerability to inject (e.g., reentrancy , front-running ). |
--contracts_dir |
str |
solidity_contracts |
Directory containing Solidity contracts (used with etherscan generator). |
--scenarios_dir |
str |
kb |
Directory containing JSON files defining generation scenarios. |
--num_contracts_per_scenario |
int |
5 |
Number of contracts to generate per scenario. |
python main.py --generator_type etherscan --contracts_dir ./contracts --vulnerability_type reentrancy
python main.py --generator_type scratch --scenarios_dir ./scenarios --vulnerability_type reentrancy
Generate 10 contracts per scenario:
python main.py --num_contracts_per_scenario 10
The script analyze_contracts.py
integrates SmartBugs analyzers for vulnerability detection in Solidity contracts.
Argument | Type | Default | Description |
---|---|---|---|
--contracts-folder |
str |
dataset/etherscan150/runtime |
Path to the folder containing smart contracts to analyze. |
--output-folder |
str |
logs |
Path to the folder where analysis results will be saved. |
--analyzers |
list |
["ethor"] |
List of analyzers to use (e.g., mythril , slither ). |
--timeout |
int |
300 |
Timeout for each analysis in seconds. |
--processes |
int |
2 |
Number of parallel processes to use for analysis. |
--mem-limit |
str |
4g |
Memory limit for each analysis process (e.g., 4g for 4 GB). |
--cpu-quota |
int |
None |
Optional CPU quota for each process (percentage of CPU usage). |
--output-format |
str |
json |
Format of the analysis results (json or sarif ). |
--runtime |
bool |
False |
Analyze runtime bytecode instead of Solidity source files. Overrides global mode. |
python analyze_contracts.py --runtime
python analyze_contracts.py --contracts-folder ./contracts --analyzers mythril slither --timeout 600 --processes 4
Scenarios are defined in JSON format in the kb/
directory. Each scenario describes:
- Name: A short name for the vulnerability.
- Scenario: A detailed explanation of the vulnerability and its exploit.
- Example: Code snippet illustrating the vulnerability.
- Issue: A summary of the flaw being exploited.
{
"name": "Cross-Function Reentrancy",
"scenario": "An attacker reenters a different function than the one initially called, manipulating shared state variables.",
"example": "function deposit() public payable {\n balances[msg.sender] += msg.value;\n}\n\nfunction transfer(address to, uint256 amount) public {\n require(balances[msg.sender] >= amount, \"Insufficient balance\");\n balances[to] += amount;\n balances[msg.sender] -= amount;\n}\n\nfunction withdrawAll() public {\n uint256 amount = balances[msg.sender];\n require(amount > 0, \"No funds\");\n (bool success, ) = msg.sender.call{value: amount}(\"\");\n require(success, \"Transfer failed\");\n balances[msg.sender] = 0;\n}",
"issue": "During `withdrawAll`, an attacker can call `deposit` or `transfer`, manipulating balances before the original function completes."
}
To inject a new vulnerability type:
- Extend the base functionality in
EtherscanDatasetGenerator
orScratchDatasetGenerator
. - Define the new injection logic.
- Update the
--vulnerability_type
argument to include the new type.
- Update the
SmartBugsRunner
class to include the new analyzer. - Test compatibility with your contracts.
- Add the analyzer to the
--analyzers
argument.
This project is licensed under the MIT License. See the LICENSE file for details.