This repository contains the official implementation of "Large Language Models are Students at Various Levels: Zero-shot Question Difficulty Estimation".
Jae-Woo Park1*, Seong-Jin Park1*, Hyun-Sik Won1, Kang-Min Kim1†
1 The Catholic University of Korea
* These authors contributed equally to this work. † Corresponding Author
This repository includes:
- LLaSA Setup.
- Question-Solving using Various LLMs.
- Question Difficulty Estimation using LLaSA and Zero-shot LLaSA.
- Project Structure
- LLaSA Setup
- Question-Solving using Various LLMs
- Question Difficulty Estimation (QDE)
- Citation
├── config # Configurations, API keys, and constants.
│ ├── __init__.py
│ ├── constants.py
│ └── api_keys.py
├── data # Contains user-provided raw data and generated processed data.
│ ├── processed # [Will be generated] Processed files.
│ │ ├── dk_test_ability.csv
│ │ ├── dk_test_difficulty.csv
│ │ ├── dk_test_question.json
│ │ ├── dk_train_ability.csv
│ │ ├── dk_train_difficulty.csv
│ │ ├── dk_train_question.json
│ │ └── dk_whole_question.json
│ └── raw # [User-provided] Raw data provided by the user.
│ ├── test_question.json
│ ├── test_transaction.csv
│ ├── train_question.json
│ └── train_transaction.csv
├── logs # [Will be generated] Log files and experiment results.
│ ├── llasa # LLaSA result logs.
│ │ └── …
│ └── question_solving # Question-solving result logs.
│ ├── …
│ ├── model_answer_log.csv
│ └── total_results.csv
├── data_setting # Scripts for data processing.
│ └── …
├── llasa # LLaSA and Zero-shot LLaSA Frameworks.
│ └── …
├── question_solving # Scripts for question-solving using LLMs.
│ └── …
└── shells # Shell scripts for running modules.
└── …
To install the R library for Item Response Theory (IRT) on Ubuntu, run:
sudo apt-get update
sudo apt-get install r-base
Rscript requirements.r
cd llms-are-students-of-various-levels
After installation, type
R
in the terminal to start the R environment.
Set up your Python environment:
pip install torch
pip install -r requirements.txt
Ensure that you download the appropriate version of PyTorch for your system.
Configure config/constants.py
and set your API keys in config/api_keys.py
.
We conducted Question Difficulty Estimation (QDE) using the following two datasets. Any dataset containing questions, answers, and students' question-solving records can be used for this task:
You need a large transaction dataset to use LLaSA effectively because IRT cannot be measured if each question has only a single response record or if a single model has only one response record.
Make sure your dataset follows this structure:
├─ data
│ ├─ raw
│ │ ├─ train_transaction.csv
│ │ ├─ train_question.json
│ │ ├─ test_transaction.csv
│ │ └─ test_question.json
Dataset Structure Details
Here is an example of train_transaction.csv
and train_question.json
. Please prepare test_transaction.csv
and test_question.json
in the same format.
train_transaction.csv:
question_id | S1 | S2 | ... | SN |
---|---|---|---|---|
Q1 | 1 | 1 | ... | 1 |
Q2 | 0 | 1 | ... | 1 |
train_question.json:
{
"question_text": "Choose the correct ...",
"question_id": 1,
"choices": ["10", "20", "30", "40"],
"answer": ["10"]
}
Run the following command to estimate student abilities and question difficulties:
sh shells/data_setting/run_irt_setting.sh
Generate hints using the GPT API:
sh shells/data_setting/run_hint_setting.sh
Merge the train and test sets for question-solving:
sh shells/data_setting/run_merge_setting.sh
This question-solving process involves LLMs directly solving problems to extract question-solving records. It was developed with reference to the code from Leveraging Large Language Models for Multiple Choice Question Answering.
Run these scripts to get question-solving records from different LLMs:
sh shells/question_solving/run_local_models.sh
sh shells/question_solving/run_anthropic_models.sh
sh shells/question_solving/run_gpt_models.sh
Analyze the results and integrate them into a unified dataset:
sh shells/question_solving/run_analyze.sh
sh shells/question_solving/run_integrate.sh
Run LLaSA without LLMDA:
sh shells/llasa/run_llasa_without_llmda.sh
Run LLaSA with LLMDA:
sh shells/llasa/run_llasa_with_llmda.sh
Run Zero-shot LLaSA using intuitive input for student levels:
sh shells/llasa/run_zeroshot_llasa.sh
Check results of LLaSA and Zero-shot LLaSA:
sh shells/llasa/run_report_1.sh # LLaSA without LLMDA
sh shells/llasa/run_report_2.sh # LLaSA with LLMDA
sh shells/llasa/run_report_3.sh # Zero-shot LLaSA
@inproceedings{
anonymous2024large,
title={Large Language Models are Students at Various Levels: Zero-shot Question Difficulty Estimation},
author={Anonymous},
booktitle={Submitted to ACL Rolling Review - June 2024},
year={2024},
url={https://openreview.net/forum?id=whRJT6j4EM},
note={under review}
}