Skip to content

hitachi-nlp/FLD-fewshot-ICL-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FLD In-Context Learning Evaluation

This repository includes the code to evaluate various large language models on FLD corpora under few-shot in-context learning (Section 5.2 of the paper).

See the entry-point repository about the whole FLD project.

Installation

The code has been tested on Python 3.11.5

$ pip install -r ./requirements/requirements.txt
$ git clone https://github.com/hitachi-nlp/FLD-task.git && pip install -e ./FLD-task
$ export PYTHONPATH=`pwd -P`:$PYTHONPATH

How to evaluate LLMs

  1. Make a dataset that pairs in-context examples and test examples:

    $ python ./scripts/make_dataset.py \
        --output-dir ./outputs/dataset \
        --dataset-name hitachi-nlp/FLD.v2 \
        --dataset-config-name default  \
        --n-shot 10 \
        --seed 0
  2. Make predictions by LLMs.

    For OpanAI model, specify model names as "openai.xx":

    $ python ./scripts/predict.py \
        ./outputs/dataset/ICL_dataset.jsonl \
        ./outputs/predictions/ \
        --model-name openai.gpt-3.5-turbo-16k \
        --max-examples 5

    For other models on the huggingface hub, specify model names as "hf.xx":

    $ python ./scripts/predict.py \
        ./outputs/dataset/ICL_dataset.jsonl \
        ./outputs/predictions/ \
        --model-name hf.Yukang/Llama-2-13b-longlora-32k-ft \
        --tokenizer-name hf.meta-llama/Llama-2-7b-hf \
        --max-examples 5 \
        --tensor-parallel-size 1
  3. Compute the metircs:

    $ python ./scripts/evaluate_proofs.py \
        outputs/predictions/predictions.jsonl \
        outputs/metrics
  4. Analyze predictions:

    $ python ./scripts/analyze_results.py \
        ./outputs/metrics/metrics.jsonl \
        ./outputs/analysis

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages