Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization

This repository is the official implementation of our paper. We consider generating high-cognitive-demand (HCD) educational questions by learning question type distribution and event-centric summarization.

Requirements

Python>=3.6 is needed, run the following commands to install requirements:

cd transformers & pip install .
pip install spacy==2.3.7
pip install torch==1.7.1
pip install pytorch-lightning==0.9.0
pip install torchtext==0.8.0
pip install rouge-score==0.0.4

Dataset

FairytaleQA can be found at here. We used the implementation in https://github.com/kelvinguu/qanli to get the QA statements. The processed QA statement data can be found at here.

NOTE: I uploaded my modified qanli at here. You'll need to first get your conllu format file and get the transformation by running step3_totxt.py (please update the paths accordingly).

Assuming we have the dataset at ./data/split and the transformed QA statements at ./data/infrence, we can prepare the needed format as follows:

python step1_toxlsx.py
python step2_topkl.py
python step3_topkllist.py

Training and Prediction

Paths need to be configured manually

Question type distribution. In tdl folder,

python train.py
python predict.py

Event-centric summary generation. In section2sum folder,

python train_section2sum.py
python generate_section2sum.py

Educational question generation. In sum2question folder,

python train_sum2qustion.py
python generate_sum2question.py

Trained Models

Question type distribution here
Event-centric summary generation here
Educational question generation file1 file2, then join them as one file by join summary2question_epoch=2.ckpt.* > summary2question_epoch=2.ckpt

Highlighted Results

Automatic evaluation on Rouge-L and BERTScore:

Human evaluation on question types (the K-L distance of question type distribution between our method and groudtruth is 0.28, while QAG (top2) is 0.60):

Human evaluation on children appropriateness: the mean rating of our method (2.56±1.31) is significantly higher than the one of QAG (top2, 2.22±1.20).

Acknowledgement

This repository is developed based on FairytaleQA_QAG_System and FairytaleQA_Baseline.

Citation

@inproceedings{zhao2022storybookqag,
    author = {Zhao, Zhenjie and Hou, Yufang and Wang, Dakuo and Yu, Mo and Liu, Chengzhong and Ma, Xiaojuan},
    title = {Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-Centric Summarization},
    publisher = {Association for Computational Linguistics},
    year = {2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization

Requirements

Dataset

Training and Prediction

Trained Models

Highlighted Results

Acknowledgement

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
images		images
section2sum		section2sum
sum2question		sum2question
tdl		tdl
transformers		transformers
.DS_Store		.DS_Store
README.md		README.md
split.zip		split.zip
step1_toxlsx.py		step1_toxlsx.py
step2_topkl.py		step2_topkl.py
step3_topkllist.py		step3_topkllist.py

zhaozj89/Educational-Question-Generation

Folders and files

Latest commit

History

Repository files navigation

Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization

Requirements

Dataset

Training and Prediction

Trained Models

Highlighted Results

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages