HRDE

Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

Introduction

As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. We addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scraping of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor detection and explainability (HRDE \footnote{HRDE is currently deployed at \url{http://www.rumors.icu/}.}). This model leverages retrieved relevant information to accurately determine whether the input health information is a rumor and provides explanatory responses, effectively aiding users in verifying the authenticity of health information. In evaluation experiments, we compared multiple models and found that HRDE outperformed them all, including GPT-4-1106-Preview, in rumor detection accuracy and answer quality. HRDE achieved an average accuracy of 91.04% and an F1 score of 91.58%.

Project Structure

Here's an overview of the project structure:

.
├── assets                        # Stores project assets, such as images, charts, etc.
├── configs                       # Stores configuration files.
├── core                          # Core code library.
│   ├── data_loader.py            # Data loading.
│   ├── es_create.py              # Elasticsearch index creation.
│   ├── milvus_create.py          # Schema creation for Milvus.
│   ├── data_to_es.py             # Import reference documents into Elasticsearch.
│   ├── data_to_milvus.py         # Import reference documents into Milvus.
│   ├── es.py                     # Retrieve reference documents from Elasticsearch.
│   ├── milvus.py                 # Retrieve reference documents from Milvus.
│   ├── reference_data_process.py # Batch import reference documents from ./data/reference_data.
│   ├── embedding_model.py        # Load the embedding model.
│   ├── llm.py                    # LLM invocation.
│   ├── main.py                   # Main function for invoking HRDE.
│   ├── run.py                    # Example of calling main.py.
│   ├── run_api.py                # API interface call for main.py.
│   ├── exprtiment.py             # Evaluation methods.
│   ├── evaluator.py              # Test the model using the evaluation dataset (calls exprtiment.py).
│   ├── evaluator2.py             # Evaluate model responses using GPT-4.
│   ├── similarity_information.py # Retrieve similar rumor titles.
│   └── utils.py                  # Other utility functions.
├── api_server.py                 # API service deployment.
├── data                          # Stores various datasets.
│   ├── reference_data            # Raw data for reference documents (few examples).
│   ├── dev_data                  # Evaluation dataset.
│   └── sft_data                  # Fine-tuning dataset.
├── prompts                       # Stores various prompt templates for LLMs.
├── outputs                       # Stores experimental result files.
└── stopwords                     # Stores text files of stop words.

Get Started

Clone the repository
- git clone https://github.com/
Prepare the environment
- conda create -n grimoire python=3.8.18
- conda activate grimoire
Install Python dependencies and load the Embedding model
- pip install -r requirements.txt
- python ./core/embedding_model.py
Configuration
- Database configuration
  - Complete the deployment of Elasticsearch and Milvus
  - Complete the configuration of Elasticsearch and Milvus in configs/es.yaml and configs/milvus.yaml .
  - Refer to es_create.py and milvus_create.py to build the corresponding databases.
  - Refer to reference_data_process.py to import the raw reference document data into both databases.
- LLMs configuration
  - Configure LLMs in configs/llm.yaml .
Check main.py and run.py to understand how to use the HRDE main function.

Results

Contact Us

For any questions, feedback, or suggestions, please open a GitHub Issue. You can reach out through GitHub Issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HRDE

Contents

Introduction

Project Structure

Get Started

Results

Contact Us

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
core		core
data		data
prompts		prompts
stopwords		stopwords
.gitignore		.gitignore
README.md		README.md
README.zh_CN.md		README.zh_CN.md
api_server.py		api_server.py
requirements.txt		requirements.txt

hush-cd/HRDE

Folders and files

Latest commit

History

Repository files navigation

HRDE

Contents

Introduction

Project Structure

Get Started

Results

Contact Us

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages