Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning
Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which always results in algorithmic instability, difficulty in convergence, or entrapment in local optima. While researchers have designed a variety of effective algorithms to compress the action space, these methods also introduce new challenges, such as the need for manually designed prior knowledge or reliance on the structure of the problem, which diminishes the applicability of these techniques. In this paper, we introduce Evolutionary action SPAce Reduction with Knowledge (eSpark), an exploration function generation framework driven by large language models (LLMs) to boost exploration and prune unnecessary action space in MARL. Using just a basic prompt that outlines the overall task and setting, eSpark is capable of generating exploration functions in a zero-shot manner, identifying and pruning redundant or irrelevant state-action pairs, and then achieving autonomous improvement from policy feedback. In reinforcement learning tasks involving inventory management and traffic light control encompassing a total of 15 scenarios, eSpark consistently outperforms the combined MARL algorithm in all scenarios, achieving an average performance gain of 34.4% and 9.9% in the two types of tasks respectively. Additionally, eSpark has proven to be capable of managing situations with a large number of agents, securing a 29.7% improvement in scalability challenges that featured over 500 agents.
Folder | Description |
---|---|
ReplenishmentEnv | Replenishment env source code. |
ReplenishmentEnv/config | Config for building the env. |
ReplenishmentEnv/data | Csv based data for skus, including sku_list, info and other dynamic data |
ReplenishmentEnv/env | Kernel simulator for env |
Baseline | Algorithms source code. |
Baseline/MARL_algorithm/ | Source code and hyperparameters for MARL baselines and eSpark. |
Baseline/OR_algorithm/ | Source code for OR baselines. |
Baseline/MARL_algorithm/prompts | GPT Prompts for eSpark. |
Baseline/MARL_algorithm/config | Algorithms hyperparameters and environment settings. |
Baseline/MARL_algorithm/run | The logic of the algorithms interacting with the environment. |
- Create a new virtual environment (Optional)
conda create -n re_rl_algo python==3.8
conda activate re_rl_algo
- Install required packages
bash setup.sh
To run eSpark, you need to setup your wandb and GPT key in ./Baseline/MARL_algorithm/config/keys.yaml
:
# ======== Wandb Configuration ========
wandb_key: "YOUR_WANDB_KEY"
# ======== GPT Configuration ========
api_key: "YOUR_API_KEY"
And set up the GPT configuration ./Baseline/MARL_algorithm/config/algs/{your-chosen-alg-config}.yaml
:
# ======== GPT Configuration ========
api_type: "YOUR_API_TYPE"
api_base: "YOUR_API_BASE"
api_version: "YOUR_API_VERSION"
engine: "GPT_ENGINE"
gpt_sample_num: YOUR-PROCESS-NUMBER (An integer)
Last, run eSpark by:
python main.py --config={your-chosen-alg-config} --env-config={your-chosen-env-config}
You can see all the algorithm configs in ./Baseline/MARL_algorithm/config/algs/
and all environment configs in ./Baseline/MARL_algorithm/config/envs/
. For example, if we want to run algorithm eSpark_base_ippo
on scenario sku100.single_store.standard
, we can use:
python main.py --config=eSpark_base_ippo --env-config=sku100.single_store.standard
eSpark distributes all processes equally across all available GPUs. If you want to assign the GPUs to use, add CUDA_VISIBLE_DEVICES
command. For example, if yor just want to use GPU0 and GPU1:
CUDA_VISIBLE_DEVICES=0,1 python main.py --config=eSpark_base_ippo --env-config=sku100.single_store.standard
To run MARL baselines, you only need to assign the algorithm config and environment config.
- IPPO training
- Specify hyper parameter if needed in algorithm file, such as ippo.yaml
- Run
python main.py --config=ippo --env-config={your-chosen-env-config}
- QTRAN training
- Specify hyper parameter if needed in algorithm file, such as qtran.yaml
- Run
python main.py --config=qtran --env-config={your-chosen-env-config}
- MAPPO training
- Specify hyper parameter if needed in algorithm file, such as ippo.yaml
- Run
python main.py --config=mappo --env-config={your-chosen-env-config}
- QTRAN training
- Specify hyper parameter if needed in algorithm file, such as qtran.yaml
- Run
python main.py --config=qtran --env-config={your-chosen-env-config}
- Get training curve in wandb
To run heuristic pruning baselines for IPPO, you have to set up the heuristic method and the related hyperparameters in ./Baseline/MARL_algorithm/config/algs/ippo_heuristic_prune.yaml
:
# ======== Heuristic pruning hyperparameters ========
mask_func_name: "Ss" # "Ss", "random" or "upbound"
p: 0.3 # The size of the action space to be randomly cropped
r0: 1 # Reduction coefficient of s in (S,s) pruning
r1: 2 # Amplification factor of S in (S,s) pruning
r2: 0.5 # Reduction coefficient of S in (S,s) pruning
upbound: 4 # upbound limit threshold for upbound pruning
after which you can run heuristic pruning methods with:
python main.py --config=ippo_heuristic_prune --env-config={your-chosen-env-config}
# Firstly specify scenario ./Baseline/OR_algorithm/base_stock.py, the run the command below:
python Baseline/OR_algorithm/base_stock.py
# Firstly specify scenario in ./Baseline/OR_algorithm/search_sS.py, the run the command below:
python Baseline/OR_algorithm/search_sS.py
You need to capture results from wandb to evaluate performance. First, set up the project you want to evaluate in .get_performance_from_wandb.py
:
# Replace 'your-entity' with your W&B entity (username or team name)
# Replace 'your-project-name' with your W&B project name
entity = 'your-entity'
project_name = 'your-project-name'
And then run:
python get_performance_from_wandb.py
You can see the performance in .wandb_averaged_experiment_results.csv
.