The repository for our ICML 2024 paper:
Neighboring Perturbations of Knowledge Editing on Large Language Models (arxiv).
knowledge editing aims at efficiently altering LLMs’ behaviors within specific domains while preserving overall performance across various inputs. Previous primarily focus on determining if the new target knowledge has been successfully memorized. However, the perturbations of editing on knowledge neighboring to the new target knowledge have not been fully explored when updating new knowledge to LLMs.
This paper investigates whether the editing operation of appending a new answer into an answer list to a factual question perturbs the neighboring knowledge encapsulated within them. It also proposes a plug-and-play framework termed APP to mitigate the neighboring perturbation by maintaining the integrity of the answer list.
The PEAK benchmark comprises two datasets of PEAK_counter and PEAK_time, which are included in data/
.
PEAK_counter.json
: the counterfactual dataset for the evaluation of knowledge editing methods on counterfactual appending.PEAK_time.json
: temporal knowledge edits of changes in the real-world.
The whole data directory is as follows:
data/
|__ PEAK_counter.json
|__ PEAK_time.json
Note: Please use Python 3.9+ To get started, simply install conda and run:
git clone https://github.com/mjy1111/PEAK.git
conda create -n PEAK python=3.9.7
...
pip install -r requirements.txt
All models are putted in hugging_cache/<model_name>
(model_name=gpt2-xl, gpt-j-6B, llama-7b, or llama2-7b).
These could be changed in hparams/<method_name>/
.
The performance of knowledge editing is measured from these dimensions:
Efficacy
: whether the edited models could recall the exact editing fact under editing promptsGeneralization
: whether the edited models could recall the editing fact under paraphrase promptsLocality
: whether the output of the edited models for inputs out of editing scope remains unchanged after editingAdditivity
: the degree of perturbation to neighboring knowledge when appending.
GPT-2 XL (1.5B), GPT-J (6B), and LLaMA-2 (7B) are used for editing.
- These model editing methods are used in our paper as follows:
After downloading the datasets and models, to get started (e.g. using ROME to edit GPT-2 XL on PEAK_counter dataset), run:
python neighbor.py \
--alg_name=ROME \
--model_name=gpt2-xl \
--ds_name=counter (time for PEAK_time dataset) \
--cuda=0 \
--dataset_size=100 (optional)
If use the proposed APP, run:
python neighbor.py \
--alg_name=ROME \
--model_name=gpt2-xl \
--ds_name=counter \
--cuda=0 \
--aerfa=0.2 \
--beta=0.2 \
--gama=0.1 \
--dataset_size=100 (optional)
Results from each run are stored at results/<data_name>/<method_name>/run_<run_id>
.
To summarize the results (e.g. using ROME to edit GPT-2 XL on PEAK_counter dataset), run:
python -m experiments.summarize --dir_name=counter/ROME/gpt2-xl
All params are in the hparams/<method_name>/
, and you can change them as needed.
For ROME and MEMIT, we also provide Wikipedia stats [Google Drive].
To use the MEND method, you should firstly download weights here. [Google Drive]. Then use the same steps above to edit models.
If you use this code and dataset, please cite our paper:
@misc{ma2024neighboring,
title={Neighboring Perturbations of Knowledge Editing on Large Language Models},
author={Jun-Yu Ma and Jia-Chen Gu and Ningyu Zhang and Zhen-Hua Ling},
year={2024},
eprint={2401.17623},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
If you have any questions related to the repository or the paper, or you encounter any problems when using the datasets/code, feel free to email Junyu Ma ([email protected])
or open an issue!
We express sincere gratitude to EasyEdit and ROME, as we have utilized portions of their source code in our project.