Knowledge Graph Engineering through Iterative Zero-shot LLM Prompting

This repository hosts the codebase and dataset from the paper Knowledge Graph Engineering through Iterative Zero-shot LLM Prompting

Codebase

This repository provides a Python implementation for an open information extraction pipeline powered by large language models (specifically, only the GPT-3.5-turbo integration is actually included). The Python implementation of this pipeline can be found in src/llm_open_ie.

The core objective of this pipeline is to leverage the inherent capabilities of LLMs to analyze, elaborate, and generate human-like text for enhancing the extraction and representation of textual information. The following is a visual representation of how the pipeline works, though it is better explained in the paper.

Running the pipeline on a text chunk should return:

A list of entities, each characterized by a proper label, description, and list of types (hypernyms).
A list of triples, whose subjects and objects come from the above entities, and whose predicates are defined by a proper label and description of the relation.

How to install

After cloning this repository, navigate to the root directory of the repository (where this README can be found) and run the following command using Python 3.10 or newer:

pip install .

⚠️ To use the GPT-3.5-turbo large language model, you also need an OpenAI API key, which should be stored in the environmental variable OPENAI_API_KEY.

Running the pipeline

The following is an example of how to execute the pipeline to extract entities and triples from a text:

from llm_open_ie import oie_pipeline
from llm_open_ie.llm.gpt import GPTOpenIE

text = '<PUT YOUR TEXT HERE>'
llm_oie = GPTOpenIE()
entities, triples = oie_pipeline(text, llm_oie)

⚠️ Since the large language model is explicitly asked to provide additional details for entities and triples, running the pipeline on very short texts (such as single sentences) where the required details are usually absent won't yield satisfactory results. Similarly, long texts lacking details for the mentioned entities will result in the same outcome.

Dataset and experiments

The datasets mentioned in the paper can be found in the dataset folder of this repository, which includes the categories ST, REBEL, and REBEL_20. For each category, a single JSON file contains all the information about the used text, extracted entities/triples, and optionally human annotations. A general intuitive schema of each JSON file can be found in dataset\json_schema.txt. Additionally, each category is provided with a notebook containing the actual code and results of the evaluation presented in the paper. To repeat the evaluation for the REBEL category, you'll need the original REBEL dataset in the dataset\REBEL\original folder. You can download it manually from its HuggingFace dataset card or automatically using the dataset\REBEL\original\rebel_download.sh script.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
src/llm_open_ie		src/llm_open_ie
.gitignore		.gitignore
README.md		README.md
pipeline.png		pipeline.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph Engineering through Iterative Zero-shot LLM Prompting

Codebase

How to install

Running the pipeline

Dataset and experiments

About

Releases

Packages

Languages

SandroGT/KG-LLM-Prompting

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph Engineering through Iterative Zero-shot LLM Prompting

Codebase

How to install

Running the pipeline

Dataset and experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages