This repository contains the data used by the paper "Automated Repair of Programs from Large Language Models" published in ICSE 2023. The repisotory is split into multiple main folders with explanations given below:
- APR_Patches
- Defects_Classifications
- LMDefects
The LMDefects folder contains the LMDefects the dataset, split into two main folders - Codex_Generated_Solutions and Codex_Generated_Solutions_Ground_Truth, the former having the originally generated solutions and the later fixed versions of the problems, assuming such a fix was found.
The APR_Patches folder contsins the fault localization information, used by both Recorder and TBar, the patches generated by the aforementioned tools and the patches generated by the three versions of Codex-e we have used.
The Defects_Classifications sheet contains the data of our classifications of the different solutions, whether they could compile or not, whether the solution is plausible and the type of fix needed.
.
├── APR_Patches // Contain all patches generated by APR tools
│ ├── Codex_Edit_Patches
│ │ ├── Codex_Edit_Bug // All correct patches produced by Codex_e_bug
│ │ ├── Codex_Edit_Line // All correct patches produced by Codex_e_line
│ │ ├── Codex_Edit_Stmt // All correct patches produced by Codex_e_stmt
│ │ └── raw_data // All patches produced by Codex Edit Mode
│ ├── Codex_fl_result // Fault localization info used by APR tools
│ ├── Recoder_Patches
│ └── TBar_Patches
├── Defects_Classifications // All defects category classification for Codex produced incorrect solution
├── LMDefects
│ ├── Codex_Generated_Solutions // All solutions generated by Codex
│ └── Codex_Generated_Solutions_Ground_Truth // All constructed ground truth
└── README.md
Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, Shin Hwei Tan
Abhik Roychoudhury