FLD Corpus

This repository includes the released FLD corpora.

See the entry-point repository about the whole FLD project.

Available Corpora

The English corpora introduced in the ICML paper are:
- FLD (FLD.3)
- FLD★(FLD.4)

Note that these corpora are version 2.0, which is detailed in the Appendix.H of our paper.

The Japanese corpora, or JFLD, are described here.

How to use the corpora

First, install the datasets library:

pip install datasets

Then, you can load the FLD corpora as follows:

from datasets import load_dataset
FLD = load_dataset('hitachi-nlp/FLD.v2', name='default')
FLD_star = load_dataset('hitachi-nlp/FLD.v2', name='star')

What does the dataset example look like?

Concept

An example of deduction example in our dataset is conceptually illustrated in the figure below:

That is, given a set of facts and a hypothesis, a model must generate a proof sequence and determine an answer marker (proved, disproved, or unknown).

Schema

The actual schema can be viewed on the huggingface hub. The most important fields are:

context (or facts in the later version of corpora): A set of facts.
hypothesis: A hypothesis.
proofs: Gold proofs. Each proof consists of a series of logical steps derived from the facts leading towards the hypothesis. Currently, for each example, we have at most one proof.
world_assump_label: An answer, which is either PROVED, DISPROVED, or UNKNOWN.

Additionally, we have preprocessed fields as follows:

prompt_serial: A serialized representation of the facts and the hypothesis.
proof_serial: A serialized representation of the proof and answer.

To train or evaluate a Language Model (LM), one can take one of two approaches:

Use prompt_serial as input and proof_serial as output. This will make the LM to generate both the proof and the answer.
Use prompt_serial as input and world_assump_label as output. This will make the LM to generate only the answer.

Further, we have "logical formula" versions of the fields, such as prompt_serial_formula, which can be used to evaluate LLMs' pure logical reasoning capabilities within the domain of logical formulas, rather than natural language.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
images		images
LICENSE		LICENSE
README.JFLD.md		README.JFLD.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLD Corpus

Available Corpora

How to use the corpora

What does the dataset example look like?

Concept

Schema

About

Releases

Packages

Contributors 3

License

hitachi-nlp/FLD-corpus

Folders and files

Latest commit

History

Repository files navigation

FLD Corpus

Available Corpora

How to use the corpora

What does the dataset example look like?

Concept

Schema

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages