BiasEdit: Debiasing Stereotyped Language Models via Model Editing

BiasEdit is an efficient model editing method to eliminate stereotyped bias from language models with small editor networks, including a debiasing loss to guide edits on partial parameters and a remaining loss to maintain the language modeling abilities during editing. Experimental results show BiasEdit' excellent performance on debiasing, modeling ability preservation, and robustness of gender reverse and semantic generality.

🆕 News

[Feb 2024] We released the paper and the refined code.
[Dec 2023] Our idea was accepted by WiNLP 2023 in EMNLP 2023!
[Nov 2023] We released the code.

📌 Table of Contents

🛠️ Setup
💻 BiasEdit
- ⌚️ Training Editor Networks
- 🚀 Debiasing with Editor Networks
👀 Bias Tracing
📝 Citation
✨ Acknowledgements

🛠️ Setup

This codebase uses Python 3.9.18. Other versions may work as well.

Create an environment and install the dependencies:

$ conda create -n biasedit python=3.9
$ conda activate biasedit
(biasedit) $ pip install -r requirements.txt

💻 BiasEdit

With StereoSet, editor networks are trained to generate parameter shifts for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.

⌚️ Training Editor Networks

Formatted datasets with train/dev/test (gender_test.json, race_test.json, religion_test.json) splits are in data/stereoset.
Configurations are in config. Partial parameters to be edited are presented in editor. The configurations, like weights to be edited, are in model.
Experimental scripts are in scripts. All hyper-parameters are in the scripts.
For the ablation study on the remaining loss, set editor.loc_coef=0.
Metrics can be found in the training log.

For example, we use the following command to train the editor networks for Gemma-2B:

 (biasedit) $ bash scripts/gemma_last2.sh

🚀 Debiasing with Editor Networks

Set eval_only=True
Set data.valid_path as the path of the test set
Metrics can be found at the end of the debiasing log, like "Test ------- XXX".
For testing the robustness of gender reverse, set data.valid_path as data/stereoset/gender_test_reverse.json.
For testing the semantic generality, set data.valid_path as data/stereoset/xxx_test_syn.json, where xxx is chosen from [gender, race, religion].

For example,

 (biasedit) $ bash scripts/gpt2m_last123_gender_reverse.sh

👀 Bias Tracing

Enter bias_tracing.

📝 Citation

If this code or paper was useful, please consider using the following citation:

@article{xin24BiasEdit,
    title={BiasEdit: Debiasing Stereotyped Language Models via Model Editing},
    author={Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley},
    year={2024},
    url={https://github.com/zjunlp/BiasEdit}
}

✨ Acknowledgements

Thanks for the original code from MALMEN.
Thanks for StereoSet and all the baselines from bias-bench.
For more model editing methods, please try EasyEdit.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bias_tracing		bias_tracing
config		config
data		data
dataset/stereoset		dataset/stereoset
editor		editor
fig		fig
scripts		scripts
README.md		README.md
main.py		main.py
model.py		model.py
nets.py		nets.py
requirements.txt		requirements.txt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiasEdit: Debiasing Stereotyped Language Models via Model Editing

🆕 News

📌 Table of Contents

🛠️ Setup

💻 BiasEdit

⌚️ Training Editor Networks

🚀 Debiasing with Editor Networks

👀 Bias Tracing

📝 Citation

✨ Acknowledgements

About

Releases

Packages

Languages

zjunlp/BiasEdit

Folders and files

Latest commit

History

Repository files navigation

BiasEdit: Debiasing Stereotyped Language Models via Model Editing

🆕 News

📌 Table of Contents

🛠️ Setup

💻 BiasEdit

⌚️ Training Editor Networks

🚀 Debiasing with Editor Networks

👀 Bias Tracing

📝 Citation

✨ Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages