LLMs for CSS

How to run testing?

Install ConvoKit

git clone https://github.com/CornellNLP/ConvoKit.git
cd ConvoKit
pip3 install -e .

Download the datasets and pre-process the datasets:

python data_loader.py -d power --save_dir ./css_data/wiki_corpus

Install dependencies

pip3 install -r requirements.txt

Add your OpenAI Key to your environment.
Usage:

python test_official_chat_css --model [MODEL_NAME_HERE] --dataset wiki_corpus

We evaluated the following models - but any model which can be loaded with HuggingFace AutoModelForSeq2SeqLM should work out of the box.

        choices=[
            "chatgpt",
            "google/flan-t5-small",
            "google/flan-t5-base",
            "google/flan-t5-large",
            "google/flan-t5-xl",
            "google/flan-t5-xxl",
            "google/flan-ul2",
            "text-davinci-001",
            "text-curie-001",
            "text-babbage-001",
            "text-ada-001",
            "text-davinci-002",
            "text-davinci-003",
        ],

File Roadmap

mappings.py - Configuration used for each dataset in the paper. Describes the type of dataset, how it should be processed from the raw format, and how the task should be formatted into a prompt from our prompting guidelines.

data_loader.py - Downloads and Converts Raw Datasets into the Seq2Seq format used by LLMs.

test_official_chat_css.py - Runs zero-shot LLM of choice - contains code for HuggingFace, ChatGPT API, and Traditional GPT API.

eval_significance.py - Computes Pairwise Bootstrap significance between the answer files of two models.

eval_agreement.py - Computes the Kappa between the LLM and the gold labels.

Citation

If you find this work useful, please cite it as follows!

@article{salt-2023-llms-for-css,
  title = {Can Large Language Models Transform Computational Social Science?},
  author = {Ziems, Caleb and Held, William and Shaikh, Omar and Chen, Jiaao and Zhang, Zhehao and Yang, Diyi},
  journal = {arXiv submission 4840038},
  year = {2023},
  month = apr,
}

Name		Name	Last commit message	Last commit date
Latest commit History 394 Commits
ConvoKit		ConvoKit
baselines		baselines
bleurt		bleurt
css_data		css_data
error_analysis		error_analysis
figures		figures
hit		hit
.gitignore		.gitignore
README.md		README.md
build_human_eval_hit_csvs.ipynb		build_human_eval_hit_csvs.ipynb
data_loader.py		data_loader.py
eval_agreement.py		eval_agreement.py
eval_generation.py		eval_generation.py
eval_human.py		eval_human.py
eval_significance.py		eval_significance.py
latex_prompt_exporter.py		latex_prompt_exporter.py
latex_tables.py		latex_tables.py
mappings.py		mappings.py
play.py		play.py
read_data.ipynb		read_data.ipynb
requirements.txt		requirements.txt
run_flan.sh		run_flan.sh
run_generation.sh		run_generation.sh
run_ul2.sh		run_ul2.sh
sanity_check_tables.sh		sanity_check_tables.sh
test_official_chat_css.py		test_official_chat_css.py
testcss_v2.py		testcss_v2.py
testreasoning.py		testreasoning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs for CSS

How to run testing?

File Roadmap

Citation

About

Releases

Packages

Contributors 4

Languages

SALT-NLP/LLMs_for_CSS

Folders and files

Latest commit

History

Repository files navigation

LLMs for CSS

How to run testing?

File Roadmap

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages