LLM-BioMed-NER-RE

This repository contains the LLM evaluation code for the npj Digital Medicine paper "An In-Depth Evaluation of Federated Learning on Biomedical Natural Language Processing for Information Extraction". The datasets used in this paper were downloaded from [FedNLP Repo]. In particular, they are NCBI-Disease, 2018 n2c2 datasets for named entity recognition (NER); GAD and 2018 n2c2 for relation extraction (RE).

Table 1: The results of LLMs with the best among 1/5/10/20-shot prompting on NER and RE tasks, compared with Blue BERT and GPT 2 trained with federated learning.

	NCBI		2018 n2c2		2018 n2c2	GAD
Model	NER				RE
	Strict	Lenient	Strict	Lenient	F1
Mistral 8x7B Instruct	0.409	0.587	0.514	0.648	0.314	0.459
GPT 3.5	0.575	0.719	0.565	0.705	0.290	0.485
GPT 4	0.722	0.834	0.616	0.751	0.882	0.543
PaLM 2 Bison	0.640	0.756	0.544	0.653	0.407	0.468
PaLM 2 Unicorn	0.726	0.848	0.621	0.749	0.888	0.549
Gemini 1.0 Pro	0.654	0.779	0.566	0.694	0.411	0.541
Llama 3 70B Instruct	0.685	0.786	0.551	0.695	0.319	0.458
Claude 3 Opus	0.788	0.879	0.680	0.787	0.832	0.569

Blue BERT (FL)	0.824	0.899	0.954	0.986	0.950	0.714
GPT 2 (FL)	0.784	0.840	0.830	0.868	0.946	0.721

NOTE:

GPTs' checkpoints are gpt-4-1106-preview and gpt-3.5-turbo-1106.
Mistral 8x7B Instruct was running on half-precision (~85GB), and Llama 3 70B Instruct was running on 4-bit quantization (~45 GB).

Data

More details of the datasets can be found in data.

Models

Model	RLHF-Tuned	Instruction-Tuned	Max Input Tokens
Mistral 8x7B	No	Yes	32K
GPT 3.5 (Chat)	Yes	No	16K
GPT 4 (Chat)	Yes	No	128K
PaLM 2 Bison (Chat)	No	No	8K
PaLM 2 Unicorn (Text)	No	No	8K
Gemini Pro (Chat)	No	No	32K
Claude 3 (Chat)	Yes	No	200K
Llama 3 70B	Yes	Yes	8K

The models used in this paper are mostly chat models and a text-completion model without specifically tuning for NER and RE tasks. We applied in-context learning by providing examples as prompts to the models. Even with 20-shot prompting, the input tokens length is still within 8K, which all models can handle in its context window.

The example notebooks are in the root folder.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
.gitignore		.gitignore
README.md		README.md
claude.ipynb		claude.ipynb
gemini.ipynb		gemini.ipynb
gpt4.ipynb		gpt4.ipynb
llama3.ipynb		llama3.ipynb
mistral.ipynb		mistral.ipynb
palm2-bison.ipynb		palm2-bison.ipynb
palm2-unicorn.ipynb		palm2-unicorn.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-BioMed-NER-RE

Data

Models

About

Languages

GaoxiangLuo/LLM-BioMed-NER-RE

Folders and files

Latest commit

History

Repository files navigation

LLM-BioMed-NER-RE

Data

Models

About

Topics

Resources

Stars

Watchers

Forks

Languages