This repository includes the code for generating the FLD corpus.
See the entry-point repository about the whole FLD project.
- (New!)
NeurIPS_2024
branch (2024-12)- We released the code for generating FLDx2 (Formal Logic Deduction Diverse).
NLP_2024_KOBE_BEEF
branch (2024-01-24)- Release at LREC-COLING 2024 and 言語処理学会 2024.
- Now capable of generating Japanese corpora (JFLD).
- Slight changes in the corpus schema.
- This branch and the generated corpora might not be compatible with older branches of related repositories.
ICML_2023
branch (2023-08-22)- Initial release at ICML 2023.
- This is version 2.0 of FLD corpora. See the Appendix H of our paper for details.
The code has been tested with Python 3.11.5.
pip install -r ./requirements/requrements.txt
export PYTHONPATH=`pwd -P`:$PYTHONPATH
- For Japanese FLD
./download_scripts/00.download_JFLD_resources.sh
- For knowledge FLD
./download_scripts/00.download_knowledge_banks.sh
We create 100k examples of FLDx2 using the generation script ./scripts/create_corpus.py
.
We recommend to parallelize the run, as the computation is large, roughly estimated as a few thousand CPU hours.
- Create 95k examples with less diverse natural language expressions:
python ./scripts/create_corpus.py \
{output_dir} \
{num_examples} \
--translation-config old-thing.v1 \
--generate-stem-steps-range '[1, 3]' \
--extend-branches-steps-range '[0, 5]' \
--argument-config ./configs/arguments/predicate/specified/axioms/ \
--argument-config ./configs/arguments/propositional/axioms/ \
--argument-config ./configs/arguments/predicate/specified/references/ \
--argument-config ./configs/arguments/propositional/references/ \
--argument-config ./configs/arguments/predicate/quantified/references/ \
--argument-config ./configs/arguments/predicate/specified/theorems \
--argument-config ./configs/arguments/propositional/theorems \
--argument-config ./configs/arguments/predicate/quantified/theorems \
--negation-arguments-weight 0.1 \
--theorem-tree-prob 0.15 \
--theorem-arguments-factor 0.01 \
--theorem-arguments-weight-adjustment-subset G_MP.syllogism.contraposition.interchangeability \
--reference-tree-prob 0.2 \
--num-workers 10 \
--seed 0
- Create 5k examples with more diverse natural language expressions (which requires more computation):
python ./scripts/create_corpus.py \
{output_dir} \
{num_examples} \
--translation-config thing_person.v2 \
--generate-stem-steps-range '[1, 3]' \
--extend-branches-steps-range '[0, 5]' \
--argument-config ./configs/arguments/predicate/specified/axioms/ \
--argument-config ./configs/arguments/propositional/axioms/ \
--argument-config ./configs/arguments/predicate/specified/references/ \
--argument-config ./configs/arguments/propositional/references/ \
--argument-config ./configs/arguments/predicate/quantified/references/ \
--argument-config ./configs/arguments/predicate/specified/theorems \
--argument-config ./configs/arguments/propositional/theorems \
--argument-config ./configs/arguments/predicate/quantified/theorems \
--negation-arguments-weight 0.2 \
--theorem-tree-prob 0.15 \
--theorem-arguments-factor 0.01 \
--theorem-arguments-weight-adjustment-subset G_MP.syllogism.contraposition.interchangeability \
--reference-tree-prob 0.1 \
--num-workers 10 \
--seed 261
-
Concatenate the above examples to create "raw" FLDx2 corpus.
-
Make "prompt-output" pairs from the corpus, following FLD-task.