FLD In-Context Learning Evaluation

This repository includes the code to evaluate various large language models on FLD corpora under few-shot in-context learning (Section 5.2 of the paper).

See the entry-point repository about the whole FLD project.

Installation

The code has been tested on Python 3.11.5

$ pip install -r ./requirements/requirements.txt
$ git clone https://github.com/hitachi-nlp/FLD-task.git && pip install -e ./FLD-task
$ export PYTHONPATH=`pwd -P`:$PYTHONPATH

How to evaluate LLMs

Make a dataset that pairs in-context examples and test examples:

$ python ./scripts/make_dataset.py \
    --output-dir ./outputs/dataset \
    --dataset-name hitachi-nlp/FLD.v2 \
    --dataset-config-name default  \
    --n-shot 10 \
    --seed 0

Make predictions by LLMs.

For OpanAI model, specify model names as "openai.xx":

$ python ./scripts/predict.py \
    ./outputs/dataset/ICL_dataset.jsonl \
    ./outputs/predictions/ \
    --model-name openai.gpt-3.5-turbo-16k \
    --max-examples 5

For other models on the huggingface hub, specify model names as "hf.xx":

$ python ./scripts/predict.py \
    ./outputs/dataset/ICL_dataset.jsonl \
    ./outputs/predictions/ \
    --model-name hf.Yukang/Llama-2-13b-longlora-32k-ft \
    --tokenizer-name hf.meta-llama/Llama-2-7b-hf \
    --max-examples 5 \
    --tensor-parallel-size 1

Compute the metircs:

$ python ./scripts/evaluate_proofs.py \
    outputs/predictions/predictions.jsonl \
    outputs/metrics

Analyze predictions:

$ python ./scripts/analyze_results.py \
    ./outputs/metrics/metrics.jsonl \
    ./outputs/analysis

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
logger_setup		logger_setup
requirements		requirements
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLD In-Context Learning Evaluation

Installation

How to evaluate LLMs

About

Releases

Packages

Languages

License

hitachi-nlp/FLD-fewshot-ICL-eval

Folders and files

Latest commit

History

Repository files navigation

FLD In-Context Learning Evaluation

Installation

How to evaluate LLMs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages