-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
963 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,72 @@ | ||
# LLM-argumentation | ||
# Exploring the Potential of Large Language Models in Computational Argumentation | ||
|
||
To be updated soon. | ||
This repo contains the data and codes for our paper ["Exploring the Potential of Large Language Models in Computational Argumentation"](https://aclanthology.org/2024.acl-long.126/) in ACL 2024. | ||
|
||
### Abstract | ||
|
||
Computational argumentation has become an essential tool in various domains, including law, public policy, and artificial intelligence. It is an emerging research field in natural language processing that attracts increasing attention. Research on computational argumentation mainly involves two types of tasks: argument mining and argument generation. As large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language, it is worthwhile to evaluate the performance of LLMs on diverse computational argumentation tasks. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings. We organize existing tasks into six main categories and standardize the format of fourteen openly available datasets. In addition, we present a new benchmark dataset on counter speech generation that aims to holistically evaluate the end-to-end performance of LLMs on argument mining and argument generation. Extensive experiments show that LLMs exhibit commendable performance across most of the datasets, demonstrating their capabilities in the field of argumentation. Our analysis offers valuable suggestions for evaluating computational argumentation and its integration with LLMs in future research endeavors. | ||
|
||
### Setup | ||
|
||
To install dependencies: | ||
|
||
``` | ||
conda create -n llm-am python=3.9 -y | ||
conda activate llm-am | ||
pip install -r requirements.txt | ||
``` | ||
|
||
To run OpenAI models, insert your [OpenAI key](https://platform.openai.com/account/api-keys) and model version in [openai_info.json](openai_info.json): | ||
|
||
``` | ||
{ | ||
"engine": "gpt-3.5-turbo-0301", | ||
"key": "YOUR API KEY" | ||
} | ||
``` | ||
|
||
### Example Usage | ||
|
||
To run Flan-T5-XL on the ibm_claims dataset using 5-shot demonstrations: | ||
|
||
``` | ||
python main.py \ | ||
--model_name flan_t5_xl \ | ||
--path_model google/flan-t5-xl \ | ||
--task claim_detection \ | ||
--data_name ibm_claims \ | ||
--num_train 5 | ||
``` | ||
|
||
The results will be printed as | ||
|
||
``` | ||
{'accuracy': 0.74, 'f1': 0.7909496513561132} | ||
``` | ||
|
||
(Note that some variance is possible) | ||
|
||
|
||
Run using your own prompts by modifying the prompts in [prompting.py](prompting.py). | ||
|
||
### Citation | ||
``` | ||
@inproceedings{chen-etal-2024-exploring-potential, | ||
title = "Exploring the Potential of Large Language Models in Computational Argumentation", | ||
author = "Chen, Guizhen and | ||
Cheng, Liying and | ||
Luu, Anh Tuan and | ||
Bing, Lidong", | ||
editor = "Ku, Lun-Wei and | ||
Martins, Andre and | ||
Srikumar, Vivek", | ||
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", | ||
month = aug, | ||
year = "2024", | ||
address = "Bangkok, Thailand", | ||
publisher = "Association for Computational Linguistics", | ||
url = "https://aclanthology.org/2024.acl-long.126", | ||
pages = "2309--2330", | ||
abstract = "Computational argumentation has become an essential tool in various domains, including law, public policy, and artificial intelligence. It is an emerging research field in natural language processing that attracts increasing attention. Research on computational argumentation mainly involves two types of tasks: argument mining and argument generation. As large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language, it is worthwhile to evaluate the performance of LLMs on diverse computational argumentation tasks. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings. We organize existing tasks into six main categories and standardize the format of fourteen openly available datasets. In addition, we present a new benchmark dataset on counter speech generation that aims to holistically evaluate the end-to-end performance of LLMs on argument mining and argument generation. Extensive experiments show that LLMs exhibit commendable performance across most of the datasets, demonstrating their capabilities in the field of argumentation. Our analysis offers valuable suggestions for evaluating computational argumentation and its integration with LLMs in future research endeavors.", | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
import json | ||
import os | ||
|
||
from typing import List | ||
from fire import Fire | ||
|
||
from pydantic import BaseModel | ||
|
||
|
||
class ArgumentSample(BaseModel): | ||
src: str = "" | ||
tgt: str = "" | ||
prompt: str = "" | ||
raw: str = "" | ||
pred: str = "" | ||
|
||
@classmethod | ||
def default_format(cls, src: str, tgt: str): | ||
src = src.strip() | ||
tgt = tgt.strip() | ||
src = src.replace('\n', ' ') | ||
tgt = tgt.replace('\n', ' ') | ||
|
||
return cls(src=src, tgt=tgt) | ||
|
||
|
||
class ArgumentData(BaseModel): | ||
samples: List[ArgumentSample] | ||
|
||
@classmethod | ||
def load_from_paths(cls, src_path: str, tgt_path: str): | ||
with open(src_path) as f: | ||
raw_src = [line for line in f] | ||
with open(tgt_path) as f: | ||
raw_tgt = [line for line in f] | ||
|
||
assert len(raw_src) == len(raw_tgt) | ||
|
||
return cls(samples=[ArgumentSample.default_format(line_src, line_tgt) for line_src, line_tgt in zip(raw_src, raw_tgt)]) | ||
|
||
@classmethod | ||
def load_train(cls, task: str, data_name: str, num_train: int, seed: int): | ||
train_folder = f"sampled_data/{task}/{data_name}/train/{num_train}_shot/seed_{seed}" | ||
if not os.path.isdir(train_folder): | ||
return cls(samples=[]) | ||
else: | ||
train_src_path = f"{train_folder}/source.txt" | ||
train_tgt_path = f"{train_folder}/target.txt" | ||
return cls.load_from_paths(train_src_path, train_tgt_path) | ||
|
||
@classmethod | ||
def load_test(cls, task: str, data_name: str): | ||
test_folder = f"sampled_data/{task}/{data_name}/test" | ||
test_src_path = f"{test_folder}/source.txt" | ||
test_tgt_path = f"{test_folder}/target.txt" | ||
return cls.load_from_paths(test_src_path, test_tgt_path) | ||
|
||
@classmethod | ||
def load(cls, task: str, data_name: str, num_train: int, seed: int): | ||
data_train = cls.load_train(task, data_name, num_train, seed) | ||
data_test = cls.load_test(task, data_name) | ||
return data_train, data_test | ||
|
||
@classmethod | ||
def load_outputs(cls, output_path: str): | ||
samples = [] | ||
with open(output_path) as f: | ||
for line in f: | ||
samples.append(ArgumentSample(**json.loads(line.strip()))) | ||
return cls(samples=samples) | ||
|
||
|
||
def test_data(task: str, data_name: str, num_train: int, seed: int): | ||
data_train, data_test = ArgumentData.load(task, data_name, num_train, seed) | ||
print(data_test.samples[0]) | ||
print("num train: ", len(data_train.samples)) | ||
print("num test: ", len(data_test.samples)) | ||
|
||
|
||
if __name__ == "__main__": | ||
Fire() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
import os | ||
import json | ||
import time | ||
import numpy as np | ||
import pandas as pd | ||
from pathlib import Path | ||
from typing import TextIO | ||
|
||
from fire import Fire | ||
from tqdm import tqdm | ||
|
||
from data_loading import ArgumentSample, ArgumentData | ||
from modeling import select_model, EvalModel | ||
from prompting import select_prompter, Prompter | ||
from scoring import select_scorer | ||
|
||
|
||
def inference( | ||
model: EvalModel, | ||
data_train: ArgumentData, | ||
data_test: ArgumentData, | ||
prompter: Prompter, | ||
file: TextIO, | ||
): | ||
|
||
progress = tqdm(data_test.samples) | ||
sample: ArgumentSample | ||
|
||
targets = [] | ||
predictions = [] | ||
for _, sample in enumerate(progress): | ||
k = int(len(data_train.samples)) | ||
prompt = prompter.run(data_train, sample) | ||
# handle prompt length | ||
while not model.check_valid_length(prompt) and k > 0: | ||
k -= 1 | ||
data_train.samples = data_train.samples[:k] | ||
prompt = prompter.run(data_train, sample) | ||
|
||
if not model.check_valid_length(prompt): | ||
prompt = model.truncate_input(prompt) | ||
|
||
# predict | ||
sample.prompt = prompt | ||
sample.raw = model.run(prompt) | ||
sample.pred = prompter.get_answer(sample.raw) | ||
print(sample.model_dump_json(), file=file) | ||
|
||
targets.append(sample.tgt) | ||
predictions.append(sample.pred) | ||
|
||
return predictions, targets | ||
|
||
def main( | ||
task: str = "conclugen", | ||
data_name: str = "base", | ||
num_train: int = 5, | ||
seed: int = 0, | ||
**kwargs | ||
): | ||
# load model | ||
model = select_model(**kwargs) | ||
print(locals()) | ||
|
||
# select prompter | ||
prompter = select_prompter(task, data_name) | ||
|
||
# load data | ||
data_train, data_test = ArgumentData.load(task, data_name, num_train, seed) | ||
|
||
# set path | ||
output_folder = f"output/{task}/{data_name}/{num_train}_shot/seed_{seed}" | ||
if not os.path.isdir(output_folder): | ||
os.makedirs(output_folder) | ||
model_name = Path(model.path_model).stem | ||
output_path = f"{output_folder}/{model_name}.json" | ||
|
||
# infer | ||
Path(output_path).parent.mkdir(exist_ok=True, parents=True) | ||
with open(output_path, "w") as file: | ||
targets, predictions = inference(model, data_train, data_test, prompter, file) | ||
|
||
# score | ||
scorer = select_scorer(task) | ||
scores = scorer.run(predictions, targets) | ||
print(scores) | ||
|
||
|
||
if __name__ == "__main__": | ||
Fire(main) |
Oops, something went wrong.