Skip to content

Latest commit

 

History

History
74 lines (54 loc) · 2.87 KB

README_EN.md

File metadata and controls

74 lines (54 loc) · 2.87 KB

read this in Chinese

AutoRE

This repository is based on the code from LLaMA-Factory and implements a document-level relation extraction system named AutoRE based on large language models. The extraction paradigm used is RHF (paper link). Currently, experiments are conducted on the Re-DocRED dataset, and it is capable of extracting triples of 96 relations from document-level text.

Usage

Download from huggingfacedante123/AutoRE。Each LoRA module has a size of 570M.

0. Environment prepare

    cd AutoRE/
    pip install -r requirement.txt

I use wandb,so make sure insert your API key in train_bash.py firstly.

api_key = os.environ.get('WANDB_API_KEY', "your api key")

1. Inference

# Modify according to the prompts in AutoRE.sh
bash AutoRE.sh
# Enter the corresponding document to automatically extract

2.model training

1)data prepare

cd AutoRE/utils/
python pre_process_data.py

2)model finetuning

cd AutoRE/
# Modify according to the prompts in AutoRE.sh and choose the RE paradigms you need
bash train_script/mistral_loras_D_R_H_F_desc.sh

3.model test

cd AutoRE/
# Choose the corresponding model for testing, the dataset is Re-DocRED, remove --inference, and set the specific model and ckpt
bash AutoRE.sh

AutoRE_analysis

This verifies whether the analysis process is helpful for extraction. The overall thought process aligns with the AutoRE framework, but includes an analysis step before each extraction phase. For specific examples, please see redocred_train_analysis.json The data and code have been shared, hoping to provide some inspiration to everyone.

Additionally, in order for AutoRE to perform more types of relation extraction, other open-source datasets, including English datasets such as FewRel and NYT, as well as Chinese datasets like HaCred, should be incorporated. If the focus is solely on the work of this paper, you only need to comment out the other parts of the data processing in the pre_process_data.py file, retaining only the part that processes RedoCred (there are many comments in the code that should help you).

Citation

If you find our work helpful, please consider citing the paper.

@article{lilong2024autore,
  title={AutoRE: Document-Level Relation Extraction with Large Language Models},
  author={Lilong, Xue and Dan, Zhang and Yuxiao, Dong and Jie, Tang},
  journal={arXiv preprint arXiv:2403.14888},
  year={2024}
}