NER_SciBERT

中文描述

NER_SciBERT

本项目是一个基于 SciBERT 的命名实体识别（NER）系统。它旨在提供一个高效、准确的NER模型，专门针对科学文献。

特点

包含预训练脚本允许在专业领域论文上使用掩码语言建模(MLM)任务进一步预训练SciBert模型。
从预训练任务中移除了下一句预测(NSP)，这会有效提高下流任务的性能。
采用SciBERT-BiLSTM-CRF架构在NER任务中表现出色。

安装

要安装和运行 NER_SciBERT，请按照以下步骤操作：

克隆仓库：git clone https://github.com/tothemoon10080/NER_SciBERT.git
安装依赖：pip install -r requirements.txt

数据集

使用了Kaggle数据集：Mineral Processing Research 来预训练SciBERT模型
为了创建 NER 数据集，收集了约500篇与矿物加工高度相关的英文论文，对其中约100篇论文的摘要部分进行了实体标注。此数据集可在.data文件夹下找到。

使用

要使用 NER_SciBERT，运行以下脚本：

预训练模型：python scripts/pertraing.py
微调模型：python scripts/train.py
使用模型预测：python scripts/predict.py

项目结构

NER_SciBERT/
- data/
  - MLM/
  - NER/
- src/
  - models/
    - torchcrf/
  - data/
    - preprocess.py/
  - utils/
- scripts/
  - train.py
  - pertraing.py
  - predict.py
- requirements.txt
- README.md

贡献

欢迎任何形式的贡献。

NER_SciBERT

This project is a Named Entity Recognition (NER) system based on SciBERT. It aims to provide an efficient and accurate NER model specifically for scientific literature.

Features

Includes a pre-training script that allows further pre-training of the SciBert model using the Masked Language Modeling (MLM) task on domain-specific papers.
The Next Sentence Prediction (NSP) task has been removed from the pre-training tasks, which effectively improves performance on downstream tasks.
Utilizes a SciBERT-BiLSTM-CRF architecture that performs excellently in NER tasks.

Installation

To install and run NER_SciBERT, follow these steps:

Clone the repository: git clone https://github.com/tothemoon10080/NER_SciBERT.git
Install dependencies: pip install -r requirements.txt

Dataset

The Kaggle dataset: Mineral Processing Research is used to pre-train the SciBERT model.
To create the NER dataset, approximately 500 English papers highly relevant to mineral processing were collected, and entity annotations were made on the abstract sections of about 100 papers. This dataset can be found in the .data folder.

Usage

To use NER_SciBERT, run the following scripts:

Pre-train model: python scripts/pertraing.py
Fine-tune model: python scripts/train.py
Use model for prediction: python scripts/predict.py

Project Structure

NER_SciBERT/
- data/
  - MLM/
  - NER/
- src/
  - models/
    - torchcrf/
  - data/
    - preprocess.py/
  - utils/
- scripts/
  - train.py
  - pertraing.py
  - predict.py
- requirements.txt
- README.md

Contribution

Any form of contribution is welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NER_SciBERT

特点

安装

数据集

使用

项目结构

贡献

NER_SciBERT

Features

Installation

Dataset

Usage

Project Structure

Contribution

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data/NER		data/NER
scripts		scripts
src		src
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

tothemoon10080/NER_SciBERT

Folders and files

Latest commit

History

Repository files navigation

NER_SciBERT

特点

安装

数据集

使用

项目结构

贡献

NER_SciBERT

Features

Installation

Dataset

Usage

Project Structure

Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages