This is the implementation for the paper "PokéChamp: an Expert-level Minimax Language Agent for Competitive Pokémon"
conda create -n pokechamp python=3.12
conda activate pokechamp
pip install -r requirements.txt
- Install Node.js v10+.
- Clone the Pokémon Showdown repository and set it up:
git clone https://github.com/smogon/pokemon-showdown.git
cd pokemon-showdown
# Optional: All repo features were tested with the following showdown version.
# git reset --hard dd4b004e54d4ef8c66c8b583a8fa64b020574727
npm install
cp config/config-example.js config/config.js
node pokemon-showdown start --no-security
Enter "http://localhost:8000/" in your browser.
To evaluate a group of agents on Gen 9 OU format:
python evaluate_gen9ou.py
This script will run battles between PokéChamp and the baseline bots, including PokéLLMon, Abyssal Bot, and others. It will output win rates, Elo ratings, and average number of turns per battle.
python local_1v1.py
First, log into your other account manually on the local server, choosing "[Gen 9] Random Battle".
python human_agent_1v1.py
Register an account on https://play.pokemonshowdown.com/ and get your password.
Open and log in: https://play.pokemonshowdown.com/
python showdown_ladder.py --USERNAME $USERNAME --PASSWORD $PASSWORD # fill in your username and password for PokéChamp, no need to set up local server.
This requires download of our dataset (release TBD). To reproduce the action prediction results:
python evaluate_action_prediction.py
This script will analyze the dataset and output prediction accuracies for player and opponent actions across different Elo ratings.
The environment is implemented based on PokeLLMon and Poke Env. This work provides an implementation of the PokéChamp paper:
@article{karten2025pokechamp,
title={PokéChamp: an Expert-level Minimax Language Agent for Competitive Pokémon},
author={Karten, Seth and Nguyen, Andy Luu and Jin, Chi},
journal={arXiv preprint arXiv:2503.04094},
year={2025}
}