See our paper: Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation.
-
Preprocess data and install dependencies.
bash preparation.sh python data_preparation.py -d [nq/tq/hq]
-
Get supporting documents generated by ChatGPT (take Natural Questions dataset as an example).
OPENAI_API_KEY=[your api key] \ python run_llm.py \ --source=data/source/nq.json \ --usechat \ --type=generate \ --ra=none \ --outfile=data/source/nq-chat.json
- Question answering.
OPENAI_API_KEY=[your api key] \ python run_llm.py \ --source=data/source/nq-chat.json \ --usechat \ --type=qa \ --ra=none \ --outfile=data/qa/nq-none-qa.json
- Priori judgement.
OPENAI_API_KEY=[your api key] \ python run_llm.py \ --source=data/source/nq-chat.json \ --usechat \ --type=prior \ --ra=dense \ --outfile=data/prior/nq-dense-prior.json
- Posteriori judgement.
OPENAI_API_KEY=[your api key] \ python run_llm.py \ --source=data/qa/nq-none-qa.json \ --usechat \ --type=post \ --ra=sparse \ --outfile=data/post/nq-sparse-post.json
Please cite the following paper if you find our code helpful.
@article{ren2023investigating,
title={Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation},
author={Ren, Ruiyang and Wang, Yuhao and Qu, Yingqi and Zhao, Wayne Xin and Liu, Jing and Tian, Hao and Wu, Hua and Wen, Ji-Rong and Wang, Haifeng},
journal={arXiv preprint arXiv:2307.11019},
year={2023}
}