AI Debate Experiment

This repository contains code and configurations for running AI debate experiments. The experiments involve different AI models debating on BoardgameQA¹ and being judged by other AI models. The goal is to explore the capabilities of AI in generating and evaluating arguments in a persuasive debate format.²

Running Experiments

Each experiment can be run using the following command pattern:

python main.py --config-file configs/<config_file>.json [additional options]

Available Experiments

Claude Haiku Self-Play:

python main.py --config-file configs/claude_haiku_self_play.json

Claude Sonnet Self-Play:

python main.py --config-file configs/claude_sonnet_self_play.json

Claude Sonnet Judging Haiku Debates:

python main.py --config-file configs/claude_sonnet_on_haikus.json

Claude Sonnet Judging with Prompt 2:

python main.py --config-file configs/claude_sonnet_on_haikus_prompt2.json --variation USER_PROMPT2

Deepseek Judging Haiku Debates:

python main.py --config-file configs/deepseek_on_haikus.json

Deepseek Judging with Prompt 2:

python main.py --config-file configs/deepseek_on_haikus_prompt2.json --variation USER_PROMPT2

GPT-4 Judging Haiku Debates:

python main.py --config-file configs/gpt4o_on_haikus.json

Optional Arguments

--sampled-data-path: Path to sampled data file (default: "data/sampled_boardgame_qa.jsonl")
--samples-per-label: Number of samples per label (default: 20)
--levels: Difficulty levels to use (default: ["LowConflict", "HighConflict"])
--excluded-labels: Labels to exclude from processing (default: ["unknown"])
--output-dir: Output directory for results (default: "results")
--variation: Optional variation suffix for output directories (default: "")

Example with custom options:

python main.py --config-file configs/gpt4_on_haikus.json --samples-per-label 30 --output-dir custom_results

TODO

Upload debate and judge records
Clean up records for the webapp

Kazemi, M., Yuan, Q., Bhatia, D., Kim, N., Xu, X., Imbrasaite, V., & Ramachandran, D. (2024). Boardgameqa: A dataset for natural language reasoning with contradictory information. Advances in Neural Information Processing Systems, 36. [Link] ↩
Khan, A., Hughes, J., Valentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., ... & Perez, E. (2024). Debating with more persuasive llms leads to more truthful answers. arXiv preprint arXiv:2402.06782. [Link] ↩

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
configs		configs
data		data
notebooks		notebooks
results		results
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
eval_prep.py		eval_prep.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Debate Experiment

Running Experiments

Available Experiments

Optional Arguments

TODO

About

Releases

Packages

Languages

fiddien/ai-debate-experiment

Folders and files

Latest commit

History

Repository files navigation

AI Debate Experiment

Running Experiments

Available Experiments

Optional Arguments

TODO

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages