This repository is based on the code from LLaMA-Factory and implements a document-level relation extraction system named AutoRE based on large language models. The extraction paradigm used is RHF (paper link). Currently, experiments are conducted on the Re-DocRED dataset, and it is capable of extracting triples of 96 relations from document-level text.
Download from huggingfacedante123/AutoRE。Each LoRA module has a size of 570M.
cd AutoRE/
pip install -r requirement.txt
I use wandb,so make sure insert your API key in train_bash.py firstly.
api_key = os.environ.get('WANDB_API_KEY', "your api key")
# Modify according to the prompts in AutoRE.sh
bash AutoRE.sh
# Enter the corresponding document to automatically extract
cd AutoRE/utils/
python pre_process_data.py
cd AutoRE/
# Modify according to the prompts in AutoRE.sh and choose the RE paradigms you need
bash train_script/mistral_loras_D_R_H_F_desc.sh
cd AutoRE/
# Choose the corresponding model for testing, the dataset is Re-DocRED, remove --inference, and set the specific model and ckpt
bash AutoRE.sh
This verifies whether the analysis process is helpful for extraction. The overall thought process aligns with the AutoRE framework, but includes an analysis step before each extraction phase. For specific examples, please see redocred_train_analysis.json The data and code have been shared, hoping to provide some inspiration to everyone.
Additionally, in order for AutoRE to perform more types of relation extraction, other open-source datasets, including English datasets such as FewRel and NYT, as well as Chinese datasets like HaCred, should be incorporated. If the focus is solely on the work of this paper, you only need to comment out the other parts of the data processing in the pre_process_data.py
file, retaining only the part that processes RedoCred (there are many comments in the code that should help you).
If you find our work helpful, please consider citing the paper.
@article{lilong2024autore,
title={AutoRE: Document-Level Relation Extraction with Large Language Models},
author={Lilong, Xue and Dan, Zhang and Yuxiao, Dong and Jie, Tang},
journal={arXiv preprint arXiv:2403.14888},
year={2024}
}